Organization - 利用漸進式視覺引導以完成連續型視線輸入

This paper is organized as follows.

Chapter 2 describes previous works in two different research regions. Chapter 3 ex-plores the design space by conducting two user studies that solved the two issues found in the pilot study. The details of the design and the implementation of GazeBeacon as well as an evaluation study are presented in Chapter 4. we briefly summarize the contributions of this paper and discuss future works in Chapter 5.

Chapter 2 Related Work

2.1 Visual Guidance for Gesture-Based Interaction

Figure 2.1: Examples of discrete multistep guiding systems. From left to right: Gesture-Bar [7], Arp`ege [13], Augmented Letters [22].

Gesture-based interfaces provide a direct and natural way to interact with, but most of them are not self-revealing to novice users. There must be hints to instruct the users (1) all available commands and (2) how to issue those commands.

One of the approaches is to simply list all of the available commands at the first time when users face the interface. Crib-sheet guide and Adaptive Guide [1] instructed novice users a set of gesture trajectories one-time right after the guidance was triggered. On the

CHAPTER 2. RELATED WORK 6

other hand, some of the techniques chose to display only part of the gesture commands and guide the users in a step-by-step way.

GestureBar [7] was like crib-sheet, but provided multistep animated demonstration of gestures for the users instead. Arp`ege [13] used finger-by-finger feedforward to gradually guide the users’ fingers to multi-finger chords. Augmented Letters [22] combined mark-based menus onto free-form letter gestures to extend their functionalities. However, the feedforward mechanisms used above were all discrete and step-wised, which might not suit for being applied to gaze-gesture interfaces. To gradually guide the gaze points of users, a continuous moving target which can be tracked by users is necessary for guiding systems designed for gaze gesture interaction.

Figure 2.2: Examples of continuous guiding systems. From left to right: OctoPocus [3], ShadowGuides [12], Gesture Play [6].

OctoPocus [3] introduced the concept of dynamic guide. By displaying the octopus-like subsequent remaining path of the subtracted gesture template, it helped the users to learn, execute and remember gestures. TouchGhost [26] demonstrated available multi-touch interaction to the users by simulating real actions with virtual animated hands on the interface. ShadowGuides [12] showed the current gestures of the users and the predicted

CHAPTER 2. RELATED WORK 7

actions displayed by the virtual shadow feedforward, which indicated possible ways to finish multi-finger or whole-hand gestures. Gesture Play [6] used physical metaphor feedforward dynamically responded to the gesture input by the users, motivating them to perform multi-finger gestures in a funny way. SimpleFlow [4] provided a scale-free guid-ing system for uni-stroke gestural interface with auto-completion and gesture-prediction functionalities applied to the enhanced feedforward mechanism. Delamare et al. [10] sys-tematically organized and unified possible design factors of the existing gestural interface guiding systems, then provided an online tool to help future researchers to design guid-ing systems on gestural interface. Still, neither guidguid-ing mechanisms they included had focused on gaze-controlled interfaces. Because of the particular nature of human eyes, there must be further discussions on gaze-gesture interfaces.

2.2 Visual Cues for Gaze Gesture Interface

Gaze-gesture interfaces use predefined sequential relative eye movements to distin-guish the users’ gesture commands from the natural usage of human eyes. Hence, it basically solves the accuracy issue caused by eye jitters and the problem of unintended actions on gaze-controlled interfaces.

Drewes and Schmidt [11] were the first to introduce gaze gestures. They designed an algorithm to process sets of location-independent discrete consecutive gaze gestures into interface commands. EyeWrite [31] further combined this concept with their previous work EdgeWrite [30] and proposed a sufficient and practical way of eye-typing. They both used square and saltire helping lines or the edges of the display as visual cues to

CHAPTER 2. RELATED WORK 8

guide the users to perform eight-direction gaze movements.

Møllenbach et al. [21] introduced single gaze gestures, which further simplified pre-vious ones into saccades from one side to the opposite side of the screen. Istance et al.

[18] revised the gestures into two-legged or three-legged gaze gestures in their work and tested them in an MMORPG. The technique resulted in more concentration at the center of the screen of the gamers. Both of them used hot zones of the display to indicate the begin and the end of the gestures.

Gazing with pEYEs [16] applied hierarchical marking menus to their gaze-controlled interface and instructed the users the available commands and the corresponding direction to execute them by several pie-formed slices. Now Dasher! Dash Away! [24] , Pies with EYEs [25] and the work of Best and Duchowski [5] used boundary-crossing to issue commands. The concept was inspired by the well-known text entry technique Dasher [29]. Item selection was performed by crossing a graphical boundary line on the interface.

Figure 2.3: Examples of mark-based gaze-gesture interfaces. From left to right: Eye-Write [31], Gazing with pEYEs [16], Now Dasher! Dash Away! [24].

Isokoski et al. [17] pioneered to introduce off-screen gaze gestures. Benefited from utilizing the area outside the screen, they extended the virtual interaction space so the

CHAPTER 2. RELATED WORK 9

commands were easy to distinguish from on-screen normal eye movements. This efficient gaze-gesture interaction was further applied to many following works [15] [14] . The physical edge of the screen serves as obvious visual cues in these techniques, not only lowering the execution efforts of the users but also reducing the visual complexity of the interface.

The works mentioned in this subsection increased the performance and decreased the error rate of them by classifying gesture strokes into a sequence of linear directional movements, which can be performed as several saccades between fixations. But all of them are predefined by each different systems, which can not generally be applied to any form of gestures, leading to an inconsistent cross-system user experience for the users.

Therefore, we attain to propose a guiding system designed for graffiti-based gaze gestures, which can be directly applied to any existing technique.

Chapter 3 Design Space

3.1 Pilot Study: Understanding OctoPocus

To better investigate how users interact with the dynamic guiding system, we imple-mented a prototype interface inspired from OctoPocus, then conducted an in-lab pilot study for deeper observation.

Figure 3.1: (a) The environment of the experiments in this paper. (b) The screenshot of the process of calibration.

Figure 3.1(a) shows the environment of the experiment. Each participant was in-structed to sit comfortably at a desk at a distance of fifty centimeters to a 23” display with a resolution of 1920x1080. The gaze of participants was recorded by Tobii EyeX

CHAPTER 3. DESIGN SPACE 11

eye tracker mounted below the screen. The average gaze estimation error was reported as 0.4° of visual angle.

We instructed the participants to calibrate their gaze input by the Tobii built-in soft-ware before the experiment. The error of gaze input was controlled below fifty pixels at each of nine calibration points shown in Figure 3.1(b). All of the following experiments in this paper have this step as well.

Figure 3.2: The gesture set used in the pilot study, including six gestures combining straight lines and curves.

As shown in Figure 3.2, we defined six basic uni-stroke single-character gestures in our pilot study including L, M, N, A, C, S to cover up possible combinations of straight and curved strokes. We instructed five in-lab participants to perform each of the gestures for three times, and debriefed them after the study.

We found two issues that primarily influence the efficiency and the execution effort of the gaze-gesture interface in our pilot study.

CHAPTER 3. DESIGN SPACE 12

Figure 3.3: The two issues found in the pilot study. (a) The miscalculation problem (b)(c) The misestimation problem.

For one thing, unavoidable eye jitters make the path crooked and rotated, resulting in low recognition rate Figure 3.3(a). Moreover, since the guiding system used in OctoPocus was generated based on the length of the gesture path already input by the users. Unlike mouse can perfectly keep steady fixating on a specific place, natural eye jitters cause perturbations and make the length of the input be counted repeatedly,

For another thing, the prediction provided from the guiding system was sometimes misestimated by the users Figure 3.3(b)(c). The muscle of arms and hands bring better control on mouse-based or stylus-based interfaces and faster reaction to upcoming paths, while gaze control suffers from the misestimations at the corner due to the late reaction caused by the usage of human eyes as input sensors and output actors simultaneously.

From the system’s view, these two issues make the gesture less likely to be recognized correctly. From the user’s view, they make the experience of gaze-gesture interfaces to be more unsatisfying. By solving them, gaze gesture paths would become more smooth and not be misidentified by the system, raising the performance and the accuracy rate of the users.

CHAPTER 3. DESIGN SPACE 13

3.2 Solving the Miscalculation Problem of Path Length

We proposed two possible solutions: smoothing the path by adding a filter on gaze points, reducing the impact on guiding system from jitters by resampling the path before length calculation.

3.2.1 Gaze Point Smoothing Filter

We concerned One Euro Filter [9] and Kalman filter as our candidates of filtering mechanism. The former is a speed-based filter, it dynamically changes the cutoff fre-quency according to the speed. Since eye jitters mostly happened when the gaze is fixating or slowly moving, One Euro Filter was thought to be an appropriate filter for eye-switch approached gaze interactions.

However, in the case of performing gliding gaze gestures, which is not rely on sac-cades jumping between fixations, the slowly-following, less-sensitive-to-jumping Kalman filter helps the users better. Therefore, we choose it as our filtering mechanism (process error covariance = 0.3, measurement error covariance = 18). Although the filtering pro-cess lowers down the reaction speed to gaze moving, it is applied only when the guiding system is triggered, hence it don’t influence the performance of normal eye movements.

3.2.2 Resample for Length Calculation

The perturbations of eye jitters cause the repeatedly calculated input length, continu-ously decrease the remaining subtracted path of the gesture templates. Thus, we resample the input points one time before length calculation, reducing the impact of eye jitters.

CHAPTER 3. DESIGN SPACE 14

Before we started, a user study was conducted to better understand the actual influence of eye jitters.

Task and Procedure

Figure 3.4: The interface and the procedure of the study of understanding the influence of eye jitters.

The environment of this study was the same as the one in the previous pilot study.

Figure 3.4 shows the interface. We instructed the participants to fixate their gaze on the black dot at the center of the screen for five seconds, which was measured as the mean completion time of gesture paths, then recorded down (1) the variation of the calculated length of gesture paths through a period (2) the min and the max jittering distance of horizontal and vertical eye jitters.

Participants

Six in-lab participants (4 females) were recruited, ranging from 21 to 23 (mean age = 21.67, s = 1.21). All of them have normal or correct-to-normal vision.

CHAPTER 3. DESIGN SPACE 15

Result and Discussion

Figure 3.5: The result of the study of understanding the influence of eye jitters.

Figure 3.5 shows the mean curves of calculated input length caused by eye jitters of the six participants that different numbers of points being resampled.

When it comes to the choice of the number of resampling points, the lesser the points we set, the lower the impact from the jitters to length calculation, since the changes of the length were reduced by resampling. But if we resample down too many points, it may result in severe input length fluctuating, which leads to discontinuous graphical guiding paths when we concatenated the subtracted templates onto them based on the length cal-culation. From the debriefing of the participants, we found that resampling to sixteen points was a rational choice of resampling with the min length of six gestures was 500 px, the jumping of the guiding system was too subtle for the participants to notice.

3.3 Solving the Misestimation Problem of Path Guidance

we found that the user behavior using OctoPocus on gaze-controlled interfaces is quite different from the one on mouse-based or stylus-based interfaces because the conscious

CHAPTER 3. DESIGN SPACE 16

and control ability of the human is much higher to the latter. The behavior of gaze on gaze-gesture interfaces is much like quickly tracing a specific moving object rather of consciously traveling along a path. Thus, it might not able to take on-time reactions to the changes of the upcoming path guidance when the gaze of the users are concentratedly focused on the end point of the path.

Figure 3.6: Different interpretations of the path guidance

Meanwhile, if the guiding system only provides guiding paths, we found that the users might (1) not sure where to put their focus on Figure 3.6(a) (2) randomly switch the focus on wherever inside the path (3) use an overall approximate form of the whole path as a focus, these inappropriate interpretations ultimately turn into misleading guidances to the users adversely.

Hence, we added a focus point at the end of the original guiding path, actively prompt-ing the users to put their focus on a consistent place, unifyprompt-ing their variant interpretations to the guidance. Furthermore, we designed two different behaviors of the focus point and tested them in the following user study. By comparing with the guiding system that only provides the feedback of gesture inks, we wanted to know how the visual form of the guiding system might influence the operation of the users performing gaze gestures.

CHAPTER 3. DESIGN SPACE 17

3.3.1 Modifications for Gliding Interaction

Gliding describes a behavior of eye movements that the gaze smoothly glides along with a predefined path. To make sure the users performing better gesture paths, we pro-posed two modifications on the prototype based on the findings in the pilot before the experiment.

Precision of the Feedback of Gaze Point

The accuracy of eye trackers is limited due to natural and technical constraints. If the provided feedback of gaze points is in fairly accurate visual forms, such as a pointer or a crosshair, there might be a gap between the expected fixation place and the actual input point (Figure 3.7(a)). It easily misguides the users to unconsciously adjust and trace the feedback point, making the fixation or gliding eye movements less performable due to the unsteadiness.

Figure 3.7: The modification on the feedback of gaze point. (a) Describes the tracing behavior of the users due to the offset error of eye trackers. (b) A general blurred form of the feedback of gaze point, covering up the offset error of eye trackers.

Therefore, we modified the feedback of the gaze point into a blurred and general circular form (Figure 3.7(b)). Unlike the original one, the new visual form blurred the accurate position of the expected fixation place and the actual input point, making the

CHAPTER 3. DESIGN SPACE 18

users less likely to notice the distance offset between them. The behavior was turned from directly projecting the gaze onto the display into slightly moving and pushing a circular object by the gaze.

Correction of the Feedforward of Guidance

The correction of the path is also an important part of guiding systems. OctoPocus [3] directly relocated the upcoming path guidance onto the current position of the users’

gaze point (Figure 3.8(a)). The advantage of this method is that the operation of the users might not be restricted by the initial start position of the menu. However, the already-made offset mistakes might not be noticed by the users and might keep cumulating.

Figure 3.8: The modification on the feedforward of guidance. (a) Use relocated position to depart from the constrain of the intended gesture template. (b) Use smooth correction to subtly redirect the users back to the intended gesture template. (c) Combine both mechanisms based on realtime recognition rates.

Multimodal motion guidance [23] used a presentation form of feedforward called smooth correction. By combining the initial gesture guidance and the relocated posi-tion with linear interpolaposi-tion, it gradually redirected the already-offset path of the users back to the intended trajectory with smooth guiding paths to increase the recognition rate (Figure 3.8(b)).

CHAPTER 3. DESIGN SPACE 19

But the smooth correction method puts too much weight on the initial gesture path.

Therefore, we used the current recognition rate as a weighted score to dynamically bal-ance the proportion of these two methods (Figure 3.8(c)). When the recognition rate is high, that is, the gesture ink by the users is still closely matched to the corresponding ges-ture template, the technique tended to use smooth correction subtly redirecting the current position. On the contrary, the correction prone to use relocated position when the recog-nition score is low. Since the path had already deviated from the intended one, it chose to ignore the former cumulated offset mistakes and open up a new gesture trajectory.

3.3.2 Task and Procedure

Figure 3.9: The interface and the procedure of the study of solving misestimation prob-lem.

The environment of this study was the same as the one in the previous pilot study. The interface is shown in Figure 3.9. At each trial, the participants were instructed to perform a gaze gesture with one specific guiding technique. After they moved their gaze into the trigger area at the center of the screen, they could start generating a gesture path by pressing the space key, holding it until they satisfied with the current form of the gesture ink. The recognizer used in this study is $1 Recognizer, a 2D single stroke recognizer based on instance-based nearest-neighbor classifier with a Euclidean scoring function. It

CHAPTER 3. DESIGN SPACE 20

compared the gesture ink with the corresponding gesture template after the users released the key, and the gesture result and the output recognition rate were recorded.

The goal of this study is focused on the influence of guiding techniques on the recog-nition rate, hence the participants were instructed to perform a path matched perfectly to the gesture template, no need to concern the completion time. Moreover, cause sac-cade is a natural behavior for human eyes to complete straight-line direction movements, eye-switch is not encouraged but neither restrained.

Guidance Techniques

Figure 3.10: The three different guidance techniques. (a) Crib-sheet guide, already-input gesture ink provided only. (b) Path Feedforward guidance. (c) Adaptive Path Feedforward Guidance.

We compared three different guiding techniques Figure 3.10.

(a) The first one only provided already input gesture ink feedback, the users under-stood and performed the available gaze gestures with an external crib-sheet guide.

(b) The second added an upcoming path guidance feedforward onto the current gaze point. And we added a circular guidance called focus point at the end of the path prompt-ing the users to put their focus on, then gradually movprompt-ing gaze along the path by trackprompt-ing the object.

CHAPTER 3. DESIGN SPACE 21

Figure 3.11: A demonstration of the focus point of the adaptive feedforward guidance waiting at the corner.

在文檔中利用漸進式視覺引導以完成連續型視線輸入 (頁 16-0)