CHUN-YING HUANG, National Taiwan Ocean University, Taiwan KUAN-TA CHEN, Academia Sinica, Taiwan
DE-YU CHEN, Academia Sinica, Taiwan HWAI-JUNG HSU, Academia Sinica, Taiwan
CHENG-HSIN HSU, National Tsing Hua University, Taiwan
A. REAL-TIME VIDEO ENCODING PARAMETERS
GamingAnywhere supports x264 [x264 2012] and vpxenc [WebM 2013] encoders for H.264/AVC and VP8 video encoding. These encoders are fairly comprehensive and provide many parameters to users for trading off the bit rate, video quality and encoding complexity. In this section, we conduct extensive experiments on an Intel i7 PC to find the best tradeoff settings for x264 and vpxenc. We use PSNR as the video quality metric, and fps as the computational complexity metric. Typically, higher PSNR (higher video quality) comes with lower fps (higher computational complexity). Our goal is to achieve
∼60 fps at 720p (1020x720) and the highest rate-distortion (R-D) performance. We record 10-min game plays in YUV format from three games: Batman, FEAR, and DOW, which are chosen from three differ-ent game genres (see Section 5.1). We then encode the raw videos with various parameters and report our observations.
A.1 x264 Encoding Parameters
Mandatory parameters for real-time encoding. Several parameters are required for real time x264 encoding. First, we need to disable the bi-directional (B) frames. We also need to disable the looka-head buffers, which are used for frame type (I, P, or B) decision and frame-level multi-threading. Dis-abling these two coding tools can be done by a convenience flag --tune zerolatency. Second, we need to enable slice-level multi-threading to leverage multi-core CPUs without incurring additional delay.
Slices are essentially disjoint regions extracted from each video frame. Slice-level multi-threading cuts each frame into t slices, and allocates a thread to encode each slice. x264 supports: (i) --sliced-threads to enable slice-level multi-threading, (ii) --slices to specify the number of slices, and (iii) --threads to control the number of threads. Third, x264 supports intra refresh to control error propagation due to packet losses. With intra refresh, each frame consists of a column of intra-coded macroblocks, and this intra-coded column moves along the time. x264 allows intra refresh via the flag --intra-refresh.
Implications of other parameters. We exercise several other parameters and present their im-plications on video quality and computational complexity. --preset is used to select a predefined com-plexity level, ranging from ultrafast to very slow; --bitrate is used to set the target bit rate; --me is used to select the motion estimation algorithms, which can be diamond, hexagonal, multi-hexagonal, exhaustive, and Hadamard exhaustive search, from low to high complexity; and lastly, --merange spec-ifies the motion vector search range, from 4 to 64 pixels. Moreover, we consider the number of threads
⃝ 2010 ACM 1551-6857/2010/05-ART1 $15.00c DOI:http://dx.doi.org/10.1145/0000000.0000000
Batman FEAR DOW Encoding Frame Rate (FPS) 0150300450
Ultrafast Encoding Frame Rate (FPS) 050100150
Diamond Encoding Frame Rate (FPS) 060120180
1 Th. Video Quality in PSNR (dB) 3035404550
Ultrafast Video Quality in PSNR (dB) 3035404550
Diamond Video Quality in PSNR (dB) 3035404550
1 Th.
Fig. 18. Results from x264. Different presets: (a) encoding frame rates and (b) achieved video quality. Different motion esti-mation algorithms: (c) encoding frame rates and (d) achieved video quality. Implications of sliced-level threading: (e) encoding frame rates and (f) achieved video quality.
t∈ {1, 2, 4, 6, 8}, and GoP size g ∈ {12, 24, 48, 96, 192, 384}. The GoP size is set by --keyint. If not other-wise specified, we employ --preset fast, --bitrate 1000, --me hex, --merange 16, t = 4, and g = 384.
We plot the results from different presets in Figures 18(a) and 18(b). We find that very fast pre-set leads to 100+ fps in all considered games. Moreover, moving from very fast to fast only results in up to 0.46 dB quality improvement, at the cost of at least 60 fps loss. Based on this observation, we recommend using very fast. We present the results from different motion estimation algorithms in Figures 18(c) and 18(d). Figure 18(d) shows that different algorithms lead to almost the same video quality. Figure 18(c) reveals that the exhaustive search algorithms may reduce the encoding frame rate by about 50%, for virtually no quality improvement. Hence, we recommend the diamond search algorithm (dia). We also find that increasing the search range (16 pixels by default) results in negli-gible impact on video quality and encoding complexity. We give the results from different number of threads in Figures 18(e) and 18(f). Figure 18(e) reveals that more slice-level threads result in higher encoding frame rates. Figure 18(f) shows that slice-level multi-threading leads to insignificant qual-ity degradation. For example, increasing the number of threads from 1 to 4 results in a small video quality degradation of at most 0.19 dB, while the encoding frame rate is almost doubled. Hence, we recommend to use 4 threads. We plot the video quality with different GoP size in Figure 19, which shows that Batman and FEAR are less sensitive to the GoP size. DOW is more vulnerable to smaller GoP sizes, because the background of Real-Time Strategy (RTS) games does not move rapidly, and thus offers more inter-frame redundancy. We find that the GoP size does not affect the computational
Batman FEAR DOW Video Quality in PSNR (dB) 30323436384042
GoP 12 24 48 96 192 384
Fig. 19. Results from x264. Quality degrada-tion due to smaller GoP size.
0 500 1500 2500
2530354045
Bit Rate (Kbps)
Video Quality in PSNR (dB)
(a)
Batman FEAR DOW
0 500 1500 2500
050100150200
Bit Rate (Kbps)
Encoding Frame Rate (FPS)
(b) Batman FEAR DOW
Fig. 20. Tradeoffs between rate and: (a) quality and (b) complexity. Results from x264.
complexity of x264. The best GoP size depends on the network condition, and we chose a medium GoP size of 48.
Performance of x264. We report the complexity-rate-distortion relation under the recommended encoding parameters:
--profile main --preset faster --tune zerolatency --bitrate $r --ref 1 --me dia --merange 16 --intra-refresh --keyint 48 --sliced-threads --slices 4 --threads 4 --input-res 1280x720,
where $r is the encoding rate. We vary $r between 250 and 3000 kbps and plot the rate-complexity and rate-quality curves in Figure 20. Figure 20(a) reveals that, for Batman, FEAR, and DOW, we achieve a good video quality of 35 dB at respective bit rates of only about 250, 800, and 1500 kbps. For an excellent video quality of 40 dB [Wang et al. 2001, p. 29], Batman and FEAR require bit rates of about 800and 2000 kbps, while DOW demands slightly over 3000 kbps. These bit rates are widely available in modern access networks. Last, Figure 20(b) reports the encoding frame rates under different bit rates. This figure reveals that for a video quality of 35 dB, the encoding frame rates are 160+, 130+, and 140+, which are much higher than the rendering frame rates of most games.
A.2 vpxenc Encoding Parameters
Mandatory parameters for real-time encoding. We present the required parameters for real-time vpxenc encoding in the following. First, we need to enable one-pass encoding (instead of two-pass en-coding) using --passes=1. Second, we set --end-usage=cbr to use CBR (constant bit rate) encoding, which reduces the rate fluctuations and thus is more suitable to real-time streaming. Third, real-time encoding dictates zero buffering, which is achieved by --buf-initial-sz=0, --buf-optimal-sz=0, --buf-sz=0. Fourth, we enable multi-threading by (i) --threads to specify the number of threads and (ii) --token-parts to specify the number of partitions, where each partition may be encoded by differ-ent differ-entropy encoders (in differdiffer-ent threads). We set the number of partitions to its maximal value 8, and vary the number of threads.
Implications of other parameters. We study how the parameters affect video quality and com-putational complexity. vpxenc exposes fewer encoding parameters, compared to x264. vpxenc supports 3 modes and 23 levels that lead to different tradeoffs of video quality and computational complex-ity. The three modes are: --best, --good, and --rt, where good and rt have 6 and 16 levels, respec-tively. The 16 levels of mode rt specify a target CPU usage (of vpxenc), from 0 to 100%. We consider
Batman FEAR DOW (a)
Encoding Frame Rate (FPS) 050150250
RT (100%)
Encoding Frame Rate (FPS) 02060100140 1 Th.2 Th.
4 Th.
6 Th.
8 Th.
Batman FEAR DOW (e)
Encoding Frame Rate (FPS) 020406080
Gop=12
Video Quality in PSNR (dB) 30354045
RT (100%)
Video Quality in PSNR (dB) 30354045
1 Th.
Video Quality in PSNR (dB) 253035404550 Gop=12
24 48
96 192 384
Fig. 21. Results from vpxenc. Different presets: (a) encoding frame rates and (b) achieved video quality. Different motion estimation algorithms: (c) encoding frame rates and (d) achieved video quality. Implications of sliced-level threading: (e) encoding frame rates and (f) achieved video quality.
0 500 1500 2500
2530354045
Bit Rate (Kbps)
Video Quality in PSNR (dB)
(a)
Fig. 22. Tradeoffs between rate and: (a) quality and (b) complexity. Re-sults from vpxenc.
1 2 3 4
Number of instances Average processing delay (ms) 0102030405060
27 38
54 67
Fig. 23. Processing delays of multiple GamingAnywhere instances running the game Cube 2: Sauerbraten.
7 encoding modes/levels in total, best, good(0), good(1), . . . , good(5), and rt(100%), from high to low complexity. --target-bitrate sets the target rate, and --kf-max-dist sets the maximal GoP size.
If not otherwise specified, we employ --good, --cpu-used=4, --target-bitrate=100, --threads 4, --kf-max-dist 384.
We plot the results from different models/levels in Figures 21(a) and 21(b). We find that vpxenc only achieves≤ 50 fps, except with good(5) (--good and --cpu-used=5). Since the video quality drop from good(4) to good(5) is negligible, we recommend to use good(5). We present the results from different
Table I. Profiling summary for Cube 2: Sauerbraten — Without and with GamingAnywhere Local gameplay (without GA) Remote gameplay (with GA) CPU cycles 66,263,040,386 (100%) 613,822,039,953 (100%)
consumed by 22.80% libdricore.so 47.99% libdricore.so
16.02% sauerbraten† 20.48% libx264.so
13.07% [kernel] 12.27% libga.so
12.65% i965 dri.so 4.92% libc.so
8.84% libc.so 3.54% [kernel]
5.61% libjpeg.so 2.42% sauerbraten
Branches 9,816,197,930 59,332,965,017
Branch misses 274,832,612 1,047,463,987
Branch miss rate 2.79% 1.76%
Cache misses 57,157,352 1,018,626,894
dTLB loads 22,944,538,803 195,569,272,710
dTLB stores 12,663,900,914 90,938,730,599
dTLB store misses 11,571,948 50,717,095
dTLB store miss rate 0.09% 0.05%
Invoked system calls 1,368,700 (100%) 804,627 (100%)
contributed by 1,083,856 (79%) ioctl 491,002 (61%) ioctl
142,145 (10%) clock gettime 68,274 (8%) clock gettime
38,403 (3%) nanosleep 65,466 (8%) futex
24,471 (2%) socketcall 51,969 (6%) nanosleep
19,152 (1%) read 31,817 (4%) socketcall
†sauerbraten is the process name of the game Cube 2: Sauerbraten.
number of threads in Figures 21(c) and 21(d). Figure 21(d) shows that the video quality drops when there are more threads. This can be attributed to the design of multiple partitions and entropy coders—
more partitions mean less redundancy. Figure 21(c) reveals that 6 threads are required for 60 fps and thus we recommend to use 6 threads. The impact of GoP size is shown in Figures 21(e) and 21(f).
Although we observe a tradeoff between video quality and computational complexity, the deviation is moderate, and the best tradeoff depends on the network condition. We chose a medium GoP size of 48.
Performance of vpxenc. We report the complexity-rate-distortion relation under the recommended encoding parameters:
--i420 -w 1280 -h 720 -p 1 -t 6 --token-parts=3 --good --cpu-used=5 --end-usage=cbr --target-bitrate=$r --fps=30000/1000 --buf-sz=0 --buf-initial-sz=0 --buf-optimal-sz=0 --kf-max-dist=48.
We vary $r between 250 and 3000 kbps, and plot the rate-complexity and rate-quality curves in Fig-ure 22. FigFig-ure 22(a) shows that vpxenc achieves 35 dB at bit rates of about 250, 1000, and 3000 kbps for Batman, FEAR, and DOW, respectively. Figure 22(b) reveals that the corresponding encoding frame rates are 170+ in all considered games. Compared to x264 (Figure 20), we found that vpxenc achieves a slightly higher frame rates (up to 30 fps) at the expense of lower coding efficiency. Nonetheless, the resulting bit rates are available in most access networks nowadays.
B. ADDITIONAL EXPERIMENTAL RESULTS
Table I presents the detailed profiling results. Corresponding descriptions are given in Section 5.6. Fig-ure 23 gives the results of running multiple GamingAnywhere instances, and the detailed discussions are given in Section 5.7.