4. Empirical Analysis
4.1 Data
4.1.2 Descriptive Statistics
In this subsection, descriptive statistics are applied to clarify the distribution and discrete degree of the study sample. The sample data consist of a panel of 4,611 routes among 318 airports in East Asia from July to December 2017 and include 584,968 observations in total. The variables are classified into continuous and category variables.
For continuous variables, indicators, such as mean, median, minimum, and maximum, are applied to understand the distribution and skewness of the variables. Standard deviation and variation coefficient are used to observe the dispersion of variable values.
For category variables, the category percentage is calculated to observe the distribution of each category. The descriptive statistics of the study variables are listed in Table 4.2.
1. Dependent variable
The dependent variable, which is the daily average arrival delay of a route (DELAY), is right-skewed, which implies that the mass of the delays of the study sample is below the mean of around 21 minutes of delay. The variation coefficient of DELAY is around 3, which means that the sample is dispersed and that the delays of the observations vary in a wide range.
2. Three perspective independent variables (1) Airport variables
An HHI between 1,500 and 2,500 is usually classified as a moderately concentrated market4 (U.S. Department of Justice and Federal Trade Commission, 2010). The means of HHI of origin and destination airports (HHI_O and HHI_D) are around 2,100, with a right-skewed distribution and a variation coefficient less than 1, indicating that the mass of the airports in the sample is moderately concentrated or rather competitive.
4 According to Horizontal Merger Guidelines by U.S. Department of Justice and the Federal Trade Commission issued in August 19, 2010, an HHI small than 1,500 is generally classified as an unconcentrated market, between 1,500 and 2,500 a moderately concentrated market, and above 2,500 a highly concentrated market.
75 Table 4.2. Descriptive statistics of variables of study sample
Continuous
DELAY (min.) 21.1231 5.0000 63.4804 3.0053 -327.0000 2032.0000
Airport variables
HHI_O 2152.1934 2033.6129 1335.1922 0.6204 140.6229 10000.0000
HHI_D 2151.2778 2028.5500 1335.4971 0.6208 140.6229 10000.0000
HUB_O 48.4789 38.0000 42.6993 0.8808 1.0000 190.0000
HUB_D 48.2244 38.0000 42.5934 0.8832 0.0000 190.0000
Route variables
HHI_R 7906.7955 10000.0000 2780.7349 0.3517 1111.1111 10000.0000
FLIGHT_R 3.3028 2.0000 4.8649 1.4730 1.0000 103.0000
DISTANCE_R (km)
1087.6882 998.7064 597.9344 0.5497 12.6574 3695.2852
Network variables AVGDELAY_C (min)
21.8511 13.9388 19.5657 0.8954 -27.0000 171.2859
FLIGHT_C 9957.3653 10279.0000 3479.7692 0.3495 2.0000 19862.0000
AVGDELAY_CR (min)
22.8175 12.4847 27.0193 1.1842 -38.0000 279.2174
FLIGHT_CR 580.9203 509.0000 341.2064 0.5874 2.0000 2422.0000
Demand variables
GDP(US dollars) 32969.5220 24431.2112 22503.0901 0.6825 8712.8745 131616.4384 POP(person) 15875953.7177 14399259.0000 8665508.0284 0.5458 233019 54360170
UNEMPLOY(%) 3.2811 3.3512 0.6342 0.1933 1.4033 4.5410
Category variable Category percentagea Airport variables
SLOT_O Level 1 (base): 62.65%; Level 2: 4.01%; Level 3: 33.34%
SLOT_D Level 1 (base): 62.78%; Level 2: 4.01%; Level 3: 33.21%
Month variables July: 17.18%; August: 17.44%; September: 16.39%; October: 17.05%; November: 16.29%; December (base): 15.65%
Day-of-a-week variables
Monday: 14.16%; Tuesday: 14.16%; Wednesday (base): 14.05%; Thursday: 14.11%; Friday: 14.23%;
Saturday: 14.65%; Sunday: 14.65%
Country variables
Origin country CN: 74.52%; HK: 1.47%; JP: 16.59% (base); KR:3.67%; MO:0.37%; TW: 3.38%
Dest. country CN:74.52%; HK: 1.49%; JP: 16.55% (base); KR:3.69%; MO: 0.37%; TW: 3.38%
a Category notations are defined in Table 3.4.
The means of the hubness of origin and destination airports (HUB_O and HUB_D) are around 48 connections, which are regarded as medium hubs5. The distribution of the
5 According to Mayer and Sinai (2003) and Rupp (2009), small, medium and large hubs are airports with 26-45, 46-70, and 71+ destinations; Santos and Robin (2010) instead considered these three levels of hubs as with 15-44, 45-69, and 70+ destinations.
76
sample is right-skewed, which means that the mass of the airports in the sample is small or medium hubs probably because this study includes any size of airports.
For the slot control level of origin and destination airports (SLOT_O andSLOT_O), which are category variables, around 63% of the sample departs from or arrives at Level 1 (non-coordinated) airports, whereas around 33% arrives at Level 3 (fully-coordinated) airports because this study covers all airports in the region, even small ones.
(2) Route variables
For the HHI of a route (HHI_R), different from HHI_O and HHI_D, the mean is almost up to 8,000. In addition, the distribution of the sample is left-skewed, and the variation coefficient is around 0.4. Hence, the mass of the sample is on routes with a high degree of market concentration.
The mean of the number of flights on a route (FLIGHT_R) is around 3.3. The sample is right-skewed, and the variation coefficient is above 1, implying that the distribution is dispersed.
For the flying distance of a route (DISTANCE_R), probably because the observations of this study are the routes in the East Asia region, the average flying distance is around 1,100 km (as a reference, the flying distance from Taiwan Taoyuan International Airport [TPE] to Beijing Capital International Airport [PEK] is 1,718 km).
The variation coefficient is around 0.5, which means that the mass of the sample is concentrated instead of dispersed.
(3) Network variables
The average delays of connected airports (AVGDELAY_C) and connecting routes (AVGDELAY_CR) are around 22 minutes. This number is not far from that of the dependent variable (DELAY). The distributions for both variables are right-skewed, and the variation coefficients are around 0.9 and 1.2, respectively. The distribution of the former is more concentrated, whereas that of the latter is more dispersed.
77
Meanwhile, the distributions of the sample for the total number of flights at connected airports (FLIGHT_C) and on connecting routes (FLIGHT_CR) are less dispersed than that for AVGDELAY_C and AVGDELAY_CR. The difference in the number of flights may affect how many delays will propagate from connected airports to the origin and destination airports of a route.
3. Control variables
Control variables in this study include demand, time, and country.
For demand variables, including the sum of annual GDP per capita (GDP), the sum of population (POP), and the weighted average unemployment rate (UNEMPLOY) of the origin and destination, the ranges of all three variables are large probably because the study sample includes all sizes of airports, even small ones from rural areas. The mean and median values of GDP and POP are far from the minimum values. The variation coefficients of the two variables are smaller than 1, which means that the observations are distributed in origins and destinations with a high GDP and population. The mean and median values of UNEMPLOY are left-skewed, and the variation coefficient is smaller than 1, implying that the observations are also distributed at origins and destinations with a high unemployment rate.
For time variables, observations are averagely distributed among months, with July and August having slightly more observations than other months. Thus, several routes have more flights than others in these months. Similarly, observations are averagely distributed in all days of the week, with Friday, Saturday, and Sunday accounting for a slightly higher percentage than other months. Therefore, more flights are available from Friday to Sunday than on other days of the week.
78