Model Validation - 量化網際網路使用者滿意度之通用統計方法

To assess the overall goodness-of-fit of our model, we can use the Cox and Snell resid-uals [10]. If the model is correctly fitted, the random variable r_i = ˆH(t_i, Z_i) will have an exponential distribution with a hazard rate of 1, where ˆH(t_i, Z_i) is the esti-mated cumulative hazard rate for session i with risk vector Z_i. Accordingly, the plot

CHAPTER 3. METHODOLOGY 15

of r_iand its Kaplan-Meier estimate of the survival function ˆS(r) will be a straight line through the origin with a slope of 1. We can further validate our model by prediction;

that is, given a performance vector Z, we can predict the most probable session time as the median time of the estimated survival curve, i.e., inf{t : S(t|Z) ≤ 0.5}; while S(t|Z) = exp(−H(t|Z)) is the computed survival function for the session with risk vector Z. By the relation, one can sort and group all sessions by their risk scores, β^tZ, and predict session times based on the median risk score in each group. Then, the model can be validated by comparing the predicted session times with the actual median times to determine if they are within a certain predicted confidence band.

Chapter 4 Case Study

In this section, we consider two case studies to demonstrate the complete procedure for quantifying user satisfaction. We first investigate online gaming, since it is one of the most profitable businesses on the Internet. Then, we explore user satisfaction on VoIP, which is one of the most popular services on the Internet. By determining user satisfaction, service providers can deliver better service quality to users and thereby boost service subscriptions.

4.1 Online Game

The popularity of online gaming has increased rapidly in recent years; however, users still experience unfavorable network conditions. Numerous complaints about long or frequent lags are made in game-player forums. Thus, to understand online gamers’

QoS-sensitivity, we investigate the relationship between gamers’ playing times and

CHAPTER 4. CASE STUDY 17

network QoS factors.

We collected traces from ShenZhou Online [11], a commercial massively multi-player online role-playing game (MMORPG). To play ShenZhou Online, thousands of players pay a monthly subscription fee at a convenience store or online. As a typical MMORPG, players can engage in fights with random creatures, train themselves to acquire particular skills, partake in commerce or take on a quest. Compared to other game genres, such as FPS (First-Person Shooting) games, MMORPGs are relatively slow-paced and have less-stringent service requirements. Therefore, they could be viewed as a baseline for real-time interactive online games. In other words, if network QoS frustrates MMORPG players, it should also frustrate players of other game gen-res. With the help of the ShenZhou Online staff, we recorded all inbound/outbound game traffic of the game servers located in Taipei; a total of 15,140 game sessions over two days. The observed players were spread over 13 countries, including China, India, Hong Kong, and Malaysia, and hundreds of autonomous systems, thus manifesting the heterogeneity of network-path characteristics and the generality of the trace.

4.1.1 Performance Factor Identification

According to the theory proposed in [12], when users are playing a game, if the feeling of involvement in the virtual world is diminished by network lags, they become more consicous of the real world, which reduces their sense of time distortion. Therefore, we expect players’ staying times in MMORPGs will be affected, to some extent, by the network’s QoS. From the collected trace, we extracted the network performance for each session based on the sequence number and flags in the TCP packet header. On

CHAPTER 4. CASE STUDY 18

Figure 4.1: Relationship between game session times and network QoS factors

average, players stayed for 100 minutes after joining a game. However, the difference in individual game-playing times was quite large; for example, the shortest 20% of sessions spanned less than 40 minutes, but the top 20% of players spent more than eight hours continuously in the game. Fig. 4.1 illustrates the difference in the game-playing times of sessions experiencing different levels of network quality. The figure depicts the association of game playing times with network latency, network delay

CHAPTER 4. CASE STUDY 19

variation, i.e., the standard deviation of network latency, and the network loss rate, respectively. All three plots indicate that the more serious the network impairment experienced by players, the earlier they were likely to leave the game. The changes in game-playing time are significant. For instance, gamers who experienced 150 msec latency played four hours on average, but those experiencing 250 msec latency played for only one hour an average, a ratio of 4:1. Moreover, variations in network delay and network loss induce more variation in game playing times.

4.1.2 Impact of Individual Factors

Having demonstrated that players are not only sensitive to network quality, but also reactive to it, we assess how each individual QoS factor influences players’ willingness to continue with a game or leave it.

Based on the methodology mentioned in Section 3.2, we propose a model that describes the changes in game-playing time due to network QoS. The model grades the overall quality of game playing based on specific network performance metrics, such as latency and loss, in terms of user satisfaction. The derived model takes network QoS factors as the input and computes the departure rate of online players as the output. The regression equation is derived as follows:

departure rate ∝ exp(1.6 × network latency

+ 9.2 × network delay variation

+ 0.2 × log(network loss rate)). (4.1)

CHAPTER 4. CASE STUDY 20

Risk Average game play time (hour) 02468

−1.6 −1.2 −0.8 −0.4 0 0.4

Observation Prediction

Figure 4.2: Actual vs. model-predicted game-playing time for session groups sorted by their risk scores.

Note that, the logarithm of network loss rate in the model is the result of scale trans-formation, mentioned in Section 3.2.2.

This illustrates that the player departure rate is roughly proportional to the exponent of the weighted sum of certain network performance metrics, where the weights reflect the effect of each type of network impairment. The coefficient can be interpreted by the ratio of the departure rates of two sessions. For example, suppose two players – A and B – join a game at the same time and experience similar levels of network loss and delay variations, except that their network latency is 100 msec and 200 msec, respectively. The ratio of their respective departure rates can then be computed by exp(1.6 ×(0.2 - 0.1)) ≈ 1.2, where 1.6 is the coefficient of network latency. That is, at every moment during the game, the probability that A will leave the game is 1.2 times greater than that of B.

Given the strong relationship between game-playing times and network QoS

fac-CHAPTER 4. CASE STUDY 21

tors, we can “predict” the former if we know the later. Forecasting when a player will leave a game could provide useful hints for optimizing system performance and re-source allocation. To validate our model, we compared the actual time and the model-predicted time for players in ShenZhou Online. In Fig. 4.2, we sort and group sessions by their risks scores, β^tZ, and predict game play time by the method described in Section 3.2.3. Then, the observed average game play times and the predicted times of each group are plotted in the figure. Note that, the departure rate in Equation 4.1 is pro-portional to the risk score, β^tZ. From the figure, we observe that, at the macro level, the prediction is rather close to the actual time, suggesting that a service provider can predict how long a given individual player will stay in a game and optimize resource allocation accordingly.

Note that, while this methodology can be applied to all kinds of online games, the exact equation for players’ QoS sensitivity may depend on individual game design characteristics, such as the transport protocol and client-prediction techniques used.

4.1.3 Findings and Discussion

The above results highlight the fact that network delay variations are less tolerable than absolute delay. Therefore, while current network games rely primarily on a “ping time” to select a server for a smooth game playing, delay jitters should also be con-sidered in the server selection process. We also find that players are more sensitive to network loss rates than network latency. However, a study on Unreal Tournament 2003 [13] reported that a typical network loss rate (< 6%) has no impact on user perfor-mance. We believe the difference is caused by the choice of underlying transport

pro-CHAPTER 4. CASE STUDY 22

tocol. That is, while most FPS games transmit messages via UDP, many MMORPGs, including ShenZhou Online, use TCP. Since TCP provides in-order delivery and con-gestion control, a lost packet will cause the subsequent packets to be buffered until it is delivered successfully, thereby reducing TCP’s congestion window. In contrast, packet loss incurs no overhead in UDP. In short, for TCP-based online games, packet loss in-curs additional packet delay and delay jitters, and therefore causes further annoyance to players. For this reason, and because of TCP’s high communication overhead [14], we consider that more lightweight protocols would be more appropriate for real-time interactive network games.

4.2 VoIP

Among the various VoIP services, Skype is by far the most successful. There are over 200 million Skype downloads and approximately 85 million users worldwide.

However, fundamental questions, such as whether VoIP services like Skype are good enough in terms of user satisfaction, have not been formally addressed. In this sub-section, we quantify Skype user satisfaction based on the call duration measured from actual Skype traces, and propose an objective and perceptual index called the User Satisfaction Index (USI).

To collect Skype traffic traces, we set up a packet sniffer to monitor all traffic entering and leaving a campus network. In addition, to capture more Skype traces, a powerful Linux machine was set up to elicit more relay traffic passing through it during the course of the trace collection. However, given the huge amount of monitored traffic

CHAPTER 4. CASE STUDY 23

Figure 4.3: Correlation of bit rate with session time

and the low proportion of Skype traffic, we used two-phase filtering to identify Skype VoIP sessions. In the first stage, we filtered and stored possible Skype traffic on a disk. Then, in the second stage, we applied an off-line identification algorithm to the captured packet traces to extract actual Skype sessions. Since we could not deduce round-trip times (RTT) and their jitter simply from packet traces, we sent out probe packets for each active flow while capturing Skype traffic. The trace was collected over two months in late 2005. We obtained 634 VoIP sessions, of which 462 sessions were usable because they had more than five RTT samples. Among the 462 sessions, 253 were directly-established and 209 were relayed.

4.2.1 Performance Factor Identification

Skype uses a wideband codec that adapts to the network environment by adjusting the bandwidth used. Thus, when we explore the relationship between call duration and

CHAPTER 4. CASE STUDY 24

Figure 4.4: Correlation of jitter with session time

network conditions, we must also consider the source rate, along with network delay and loss. However, we do not have exact information about the source rate of remote Skype hosts. Thus, we use the received data rate as an approximation of the source rate. For brevity, we use the bit rate to denote the received data rate. We illustrate the correlation of the bit rate and call duration in Fig. 4.3, where the median time and their standard errors are plotted. The effect of the bit rate is clear, as we find that users tend to have longer conversations when the bit rate is higher. In fact, the median duration of the top 40% of calls is ten times longer than the shortest 15%.

We also consider the jitter and round-trip time (RTT) variables, where jitter is the standard deviation of the bit rate sampled every second. It can capture the level of network delay variations and packet loss. We observe that when network impairment is more serious, users are more likely to terminate a call. For instance, as shown in Fig.

4.4, users who experienced jitter of less than 1 Kbps would make a call for 21 minutes

CHAPTER 4. CASE STUDY 25

User Satisfaction Index Median session time (min) 151050100200

5.2 6 6.8 7.6 8.4 9.2 10

Prediction 50% conf. band

Figure 4.5: Predicted vs. actual median duration of session groups sorted by their User Satisfaction Indexes.

in median; while users who experienced jitter of more than 2 Kbps would only talked for 3 minutes, which gives a high ratio of 7:1.

4.2.2 Impact of Individual Factors

To understand the impact of individual factors, we use regression analysis to model call duration as a response to QoS factors. Although we could simply put all potential QoS factors into the regression model, the result would be ambiguous if the predictors were strongly interrelated [15]. In [2], we analyze the level of correlation between QoS factors and classify them into three collinear groups. Then, we pick the bit rate, jit-ter, and RTT from each group and incorporate into the model, since they are the most significant predictors compared with their interrelated variables. For simplicity and parsimoniousness of the model, we omit the interaction terms of these three factors,

CHAPTER 4. CASE STUDY 26

although correlations between them have been observed. The developed User Satisfac-tion Index (USI) model is then used to evaluate the satisfacSatisfac-tion levels of Skype users.

As mentioned in Section 3.2.3, the risk score β^tZ is used to represent the levels of instantaneous hang up probability, as it can be taken as a measure of user intolerance.

Accordingly, we define the User Satisfaction Index of a session as its minus risk score:

USI = −β^tZ

= 2.15 × log(bit rate) − 1.55 × log(jitter)

− 0.36 × RTT.

We can further verify the proposed model by comparing the predicted call duration based on the proposed USI with the actual call duration. In Fig. 4.5, we group sessions by their USI, and plot the actual median duration, predicted duration, and 50% confi-dence bands of the latter for each group. The results show that the predicted duration is rather close to the actual median time; moreover, for most groups the actual median time is within the 50% of the predicted confidence band.

Although not shown, we use a set of independent metrics derived from patterns of user interactivity to validate USI [2]. A strong correlation between the call duration and user interactivity suggests that our model based on call duration is significantly representative of Skype user satisfaction.

4.2.3 Findings and Discussion

By deriving the objective perceptual index, we can quantify the relative impact of the bit rate, the compound of delay jitter and packet loss, and network latency on the

CHAPTER 4. CASE STUDY 27

duration of Skype calls. Also, in [2], we have derived the importance of these three factors is approximately 46:53:1 respectively. The delay jitter and loss rate are known to be critical to the perception of real-time applications. To our surprise, the above results show that network latency has relatively little effect; however, the source rate is almost as critical as jitter, which is the compound of the delay jitter and packet loss.

We believe these discoveries indicate that adaptations for a stable, higher bandwidth channel would probably be the most effective way to increase user satisfaction with Skype. The selection of relay nodes based on network delay optimization, a technique often used to find a quality detour by peer-to-peer overlay multimedia applications, is less likely to make a significant difference to Skype in terms of user satisfaction.

Chapter 5 Application

By understanding the most significant performance factors and their impacts on user satisfaction, we can further improve user experience and optimize resource allocation.

Given the quantified risk score of users leaving an application due to unsatisfactory service, systems can be modified accordingly. For example, for network applications, systems can be designed to automatically adapt to network quality in real time in order to improve user satisfaction. On the other hand, we might enhance the smoothness of usage in high-risk sessions by increasing the packet rate or the degree of data redun-dancy; thus, users would have better experiences and be less likely to leave an appli-cation prematurely. Resource alloappli-cation could be deliberately biased toward high-risk sessions. For example, scarce resources, such as processing power or network band-width, could be allocated more effectively based on session risk scores.

The developed model could also provide useful hints to resolve design trade-offs.

For instance, as the results in Section 4.1 indicate, players in ShenZhou Online are

CHAPTER 5. APPLICATION 29

less tolerant of large delay variations than high latency. Thus, providing a smoothing buffer at the client side, though incurring additional delay, would improve overall user experience. Also, the concept of session time can be used to design an alarm system for abnormal system conditions. As we know, to provide continuous high-quality services, providers must monitor system performance around the clock and detect problems in real time, i.e., before customer complaints flood the customer service center. However, monitoring a large-scale system in this way would be prohibitively expensive or even impractical. Instead, operators can track user session times, which is much more cost-effective. Since users are more sensitive to certain system performance factors, a series of unusual departures over a short period might indicate abnormal system conditions and thus automatically trigger appropriate remedial action.

Chapter 6 Conclusion

Unlike system-level performance, user satisfaction is intangible and unmeasurable.

The key to addressing this problem is our ability to measure user opinions objectively and efficiently. In this work, we have proposed a generalizable methodology, based on survival analysis, to quantify user satisfaction from session times, i.e., the length of time users stay with an application. The results of two case studies show that session time is strongly related to system performance factors, such as network QoS, and is thus a potential indicator of user satisfaction. With the derived model, service providers can further improve user experience and optimize resource allocation.

Bibliography

[1] K.-T. Chen, P. Huang, G.-S. Wang, C.-Y. Huang, and C.-L. Lei, “On the sen-sitivity of online game playing time to network QoS,” in Proceedings of IEEE INFOCOM’06, Barcelona, Spain, Apr. 2006, pp. 1–12.

[2] K.-T. Chen, C.-Y. Huang, P. Huang, and C.-L. Lei, “Quantifying skype user sat-isfaction,” in Proceedings of ACM SIGCOMM’06, Pisa, Italy, Sept. 2006, pp.

399–410.

[3] D. R. Cox and D. Oakes, Analysis of Survival Data. Chapman & Hall/CRC, June 1984.

[4] ITU-T Recommendation P.800, “Methods for subjective determination of trans-mission quality,” 1996.

[5] U. Jekosch, Voice and Speech Quality Perception Assessment and Evaluation.

Springer, 2005.

[6] A. Rix, J. Beerends, M. Hollier, and A. Hekstra, “Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone

net-31

BIBLIOGRAPHY 32

works and codecs,” in Proceedings of IEEE International Conference on Acous-tics, Speech, and Signal Processing, vol. 2, 2001, pp. 73–76.

[7] ITU-T Recommendation P.862, “Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs,” Feb 2001.

[8] E. L. Kaplan and P. Meier, “Nonparametric estimation from incomplete obser-vations,” Journal of the American Statistical Association, vol. 53, pp. 437–481, 1958.

[9] T. M. Therneau and P. M. Grambsch, Modeling Survival Data: Extending the Cox Model, 1st ed. Springer, August 2001.

[10] D. R. Cox and E. J. Snell, “A general definition of residuals (with discussion),”

Journal of the Royal Statistical Society, vol. B 30, pp. 248–275, 1968.

[11] “ShenZhou Online,” http://www.ewsoft.com.tw/.

[12] D. M. S. Ila and D. Lam, “Comparing the effect of habit in the online game play of australian and indonesian gamers.” in Proceedings of the Australia and New Zealand Marketing Association Conference, Adelaide, Australia, Dec. 2003.

[13] T. Beigbeder, R. Coughlan, C. Lusher, J. Plunkett, E. Agu, and M. Claypool, “The

在文檔中量化網際網路使用者滿意度之通用統計方法 (頁 23-0)