4.4 Regression analysis
4.4.1 Testing Hypothesis 1 to 3
There are four models in part A. We first focus on whether CumSelfikt is having a positive relation to W eeklySalesikt and if CumOtherikt has a negative relation. This is directly investigated by model A1. We add variables that belong to the basic performance information people can get when buying tickets into consideration in Model A2. P eriodk, T imek, Regionk, and Y eark are added. The interaction effects of performance infor-mation to CumSelfikt are further added in Model A3 to see more detailed interactions.
We further add Showk, P erf ormanceIdk, and W eekkt as control variables to Model A3 as Model AD3 to control the variation among performances. The effect of CumSelfikt and CumOtherikt to W eeklySalesikt might not be linear. Model A4 is then proposed to examine the relationship. We take Af ternoonk and Centralk as factors for T imek and Regionk. The reference table of variable names and math notations is in Table 4.11.
Model A1:
W eeklySalesikt = β0+ β1SCumSelfikt+ β2SCumOtherikt+
Model A2:
W eeklySalesikt = β0+ β1SCumSelfikt+ β2SCumOtherikt+ β1PP eriodk
Variable Name Math Notation
In Model A3 and AD3, we added the interaction effects and are denoted by βjXI, where X ∈ {S, P, T, R}.
We can find that the three models all have a significant correlation between cumulated sales to sales speed and a negative correlation between other price band’s cumulated sales to sales speed (Table 4.12). Hypothesis 1 and 2 are therefore proven. As for Hypothesis 3, the results of Model A2 and A3 both show that the length of sales period negatively affects sales speed and verified Hypothesis 3; however, when control variables Showk, P erf ormanceIdk, and W eekkt are added, P eriodk becomes positive. W eekkt are mostly negative with the level of W eekk1, which indicates that the sales speed of Week 1 is normally the fastest.
It is also considered that if the effect of Hypothesis 1 and 2 are not only linearly correlated. We add two variables: CumSelfikt2 and CumOtherikt2 to Model A3 as Model A4.
Model A4:
W eeklySalesikt = β0 + β1SCumSelfikt+ β2SCumOtherikt+ β1PP eriodk+ β2PW eekkt +β1TM orningk+ β2TEveningk+ β1RN orthernk+ β2RSouthernk+ β1YY eark
+β1DShowk+ β2DP erf ormanceIdk+ β1P IP eriodk× CumSelfikt +β1T IM orningk× CumSelfikt+ β2T IEveningk× CumSelfikt +β1RIN orthernk× CumSelfikt+ β2RISouthernk× CumSelfikt
+β3SCumSelfikt2 + β4SCumOtherikt2 +
The previous variables maintain the same effects in Model A4 (Table 4.12). If we fix all other variables and the equation that only consider the effect of CumSelfikt and
Regression Model A1 Model A2 Model A3 Model AD3
Adjusted R2 12.49% 18.51% 20.78% 39.60%
***p < 0.001, ** p < 0.01, * p < 0.05
Table 4.12: The Results of Part A
CumOtherikt will be
W eeklySalesikt= 0.469CumSelfikt− 0.157CumSelfikt2
+0.247CumOtherikt− 0.369CumOtherikt2 .
The effects of CumSelfikt2 and CumOtherikt2 are both interactive effects of that of CumSelfikt and CumOtherikt.
We can obtain the first derivatives of y of CumSelfikt and CumOtherikt.
∂W eeklySalesikt
∂CumSelfikt = 0.469 − 0.314CumSelfikt.
∂W eeklySalesikt
∂CumOtherikt = 0.247 − 0.738CumOtherikt.
Both CumSelfikt and CumOther are concave functions. Negative CumSelfikt2 explains that the effect of CumSelfikt continues to increase positively with decreasing margin (Figure 4.3a). As for the relationship of CumOtherikt to W eeklySalesikt is that it first decreases positively and then when cumulated sales of other price bands is high, customer tend to have less incentive to purchase the price band’s ticket (Figure 4.3b). The situation is due to the interaction of two effects. One is that when people find that other price bands are selling good, they consider the show worthy to watch and purchase their most preferable price band of the show. The second one is that when people find that other price bands are selling good, they turn to buy the price bands with higher sales. From Graph 4.3b, we can conclude that the first effect is stronger during the start of the sales period and the second effect is stronger in the end of the sales period.
We verify the correlation coefficients between variables (Figure 4.4). We find that no two variables are highly correlated.
(a) CumSelfikt (b) CumOtherikt
Figure 4.3: Change of W eeklySalesikt.
Figure 4.4: Correlation Coefficients of Variables
4.4.2 Testing Hypothesis 4
When it comes to part B, we investigate if the effects of a price band to another are stronger when their prices are closer to each other. Price bands need to separately examined. We categorize other price bands as two types: Near and Far. For P1k, the price bands that are near is P2k and the rest are far. For P2k, the near price bands are P1k and P3k. The CumSoldikt for Near and Far are CumN earikt and CumF arikt separately.
Model B:
W eeklySalesikt= β0+ β1SCumSelfikt+ β2SCumN earikt+ β3SCumF arikt
+β1PP eriodk+ β2PW eekkt+ βT1M orningk+ β2TEveningk+ β1RN orthernk+ β2RSouthernk
+β1YY eark+ β1DShowk+ β2DP erf ormanceIdk+ β1P IP eriodk× CumSelfikt
+β1T IM orningk× CumSelfikt+ β2T IEveningk× CumSelfikt
+β1RIN orthernk× CumSelfikt+ β2RISouthernk× CumSelfikt+
From Table 4.13 we find that the correlation between CumSelfikt, CumN earikt, and CumF arikt to W eeklySalesikt has the same pattern as that of Model AD3 and A4. From Figure 4.5, we can also find that Model B has the same pattern as A4. CumSelfikt, CumN earikt, and CumF arikt are all concave. CumN earikt and CumF arikt both are with decreasing marginal increase first and then reach an U-turn and decrease with in-creasing margin. Interestingly, CumF arikt has larger effects compare with CumN earikt. Hypothesis 4 is not verified and can be further investigated in the future.
Regression Model B
Table 4.13: The Results of Part B
(a) CumSelfikt (b) CumN earikt
(c) CumF arikt
Figure 4.5: Change of W eeklySalesikt for Model B.
Chapter 5
Application: Dynamic Seat Allocation
After verifying our hypotheses, we come up with a scenario that can apply our findings.
When a show with several price bands has been selling tickets for some time, people can verify if a specific price band is selling well or not with its cumulated sales outcome.
If a price band is selling very well and is about to sold out and another price band is not selling as well and has a lot of tickets remain to be sold, we can do dynamic seat allocation. We consider a case which is possible to adjust a portion of ticket capacity of the price bands selling inferior to the price bands that is about to sell out. That is to say in the end the fewer tickets are likely to be unsold.
We use regression models to predict the future sales and adjust ticket capacity of var-ious price bands based on the prediction. As we find that cumulated sales of price bands affect customer utility, we compared the profit difference of using the correct regression and the one neglecting cumulated sales.
A Performance’ Number of Price Bands Four Five Six Number of Performances 54 158 112
Table 5.1: Number of Price Bands of Performances
As previously mentioned, each performance has various numbers of price bands. We select some performances of five price bands as example for in our data set the perfor-mances with five price bands have the largest quantity (Table 5.1).
5.1 Exploratory data analysis
Having the same procedure as Chapter 4, we conduct exploratory data analysis in prior to implementing the application. The data with five price bands contains 20,125 rows of data and is examined. We examine the sales of performances by adding up N umSoldikt of the five price bands. Then, we separately analyze each price band’s N umSoldikt and W eeklySalesikt. Due to the resemblance of distribution, we listed out only the information of N umSoldikt in Table 5.2. The distributions are also all exponential and are very similar. Figure A.2a is taken as an example. As for the cumulated sales of the five price bands the variation is also huge and Figure A.2b is taken as an example.
P eriodk is also right skewed for the performances with five price bands. From Table 5.2, we can identify that variations of the variables are all high.
T imekand Regionk have resembling patterns to the first part (Figure A.2d and A.2e).
The performances played in the morning is still the least (26). In the evening, there are 58 performances played during this time, which is slightly lower than the ones played in the
Variable Mean Standard Deviation Maximum Minimum
N umSoldkt of Performance k 35.378 66.779 728 −164
N umSold1kt of P1k 3.409 8.664 110 −33
N umSold2kt of P2k 4.250 9.655 138 −8
N umSold3kt of P3k 3.669 8.500 100 −26
N umSold4kt of P4k 3.637 7.994 71 −6
N umSold5kt of P5k 2.964 7.795 123 −27
CumP1kt× Capacity1k 57.699 56.872 261 0
CumP2kt× Capacity2k 75.800 68.662 331 0
CumP3kt× Capacity3k 70.460 74.712 549 0
CumP4kt× Capacity4k 66.463 72.878 509 0
CumP5kt× Capacity5k 50.686 54.196 391 0
P eriodk 161.224 92.433 359 33
Table 5.2: Exploratory Analysis of the Variables of the Second Part
afternoon (59). Northern Taiwan continues to be the location where most performances take place (111). Central Taiwan has 23 performances and Southern Taiwan only has 9.
Y eark of the performances of the five price bands is very different from that of the first part. Numbers of performances of five price bands progressively decrease from 42 in 2008, 34 in 2009, 13 in 2010 to 9 in 2011. In 2012, it suddenly escalates to 45 (Figure A.2f).