Experiment Procedures - 紋理合成貼圖的注視點與評分預測

Figure 3.2: This figure shows the texture categories of A, B, C, D, and E and their texture synthesis results. These textures are near-regular textures.

synthesis[17], or near-regular texture synthesis[18]. There are 44 synthesized textures in total.

Figure 3.2 and 3.3 show all the input and synthesized textures used in our experiment.

3.2 Experiment Procedures

In the beginning of the experiment, the subjects were asked to sit down with a position that they feel comfortable to look at the screen. The viewing distance from a subject to the screen is controlled within 50-80 cm with the screen, which is acceptable by the eye tracker. All

sub-3.2 Experiment Procedures 14

jects have to do calibration for their eye-positions. During the procedure, a subject was asked to look at specific points on the screen. The resulting information is then integrated in the eye model and the gaze point for each image sample is calculated. If the accuracy of calibration pass the requirement, we can start the following procedure; otherwise the calibration should be performed again or the subject need to be replaced.

After calibration, subjects were told that the following scene will have two images. Left one is the input image for a texture synthesis algorithm; right one is the synthesis result of the left image on the screen. Figure 3.4 is what a subject really saw on the screen during the ex-periment. Then the subjects will be asked to give a score (between 1 and 5, 1 representing the least satisfactory and 5 representing the most satisfactory) for each right image, according to the quality of the synthetic texture compared to the input texture.

Finally, we began the recording process. Each pair of images appear on the screen for 10 seconds. After images disappear, the text which asks a subject to give a score will be shown on the screen. After the subject gives a score, next pair of images will appear. Each pair of images was shown in a random order. After showing 44 pairs of textures, the whole recording is completed. And we can then get the eye-movement data of subjects while they were rating each synthetic texture. Figure 3.5 illustrates our experiment process.

3.2 Experiment Procedures 15

Figure 3.3: This figure shows the texture categories of F, G, H, I, J, and K and their texture synthesis results. Textures G, I, and K are irregular textures. The remaining ones are near-regular textures.

3.2 Experiment Procedures 16

Figure 3.4: This image shows what subjects really see on the screen.

Figure 3.5: Procedure of the experiment. For each subject, they was asked to sit and have a position which make them feel comfortable ,and then do the calibration for eye-tracker. Next, we will let subjects to see some sample image to let them get familiar with the environment.

After these procedures, the main recording will be started: the subject will observe each image for 10 seconds, and then will be ask to give a score for the synthesis result. The images will appear in random order. After all the images have been observed and scored by the subject, we can get the eye-movement data during the observation and scores for each image.

C H A P T E R 4

Our Approach

To satisfy the goal of evaluating the quality of a textur with perception, we present two models consisting of visual attention model and perceptual rating model in this thesis. Both of them include two stages: training data collection and learning. We will explain how to prepare the data and build the SONFIN model. While seeing a texture, the visual attention model is used to simulate human’s fixation, and the perceptual rating model is to predict the score. We train these models with the ground truth data recorded from the eye-tracking experiment. In the following sections, we will describe our feature extraction approach, the SONFIN model and proposed visual attetion and perceptual rating models..

4.1 Feature Extraction

Inspired by Peters et al.[9], we develop our model with bottom-up stage and top-down stage.

The bottom-up stage defines features of synthesized textures and the top-down stage contains subjects’ fixation data. Judd et al.[11] shows that we may sample positive-fixation parts and

4.1 Feature Extraction 18

negative-fixation parts as an input to train our model.

Figure 4.1: Example of structural textures. The left one is input texture and the right four are synthesized textures.

The goal of our experiment is to develop a model to evaluate textures. The result of Lin et al.[1] shows that that most regularity-preserved textures have higher user scores than regularity-broken textures. This implies that the regularity of structure has more effects than the color/intensity of textures while human judge synthetic structural textures as shown in Figure 4.1.

Example-based texture synthesis algorithm requires an input texture to synthesize a larger size of texture, so here we extract the feature by comparing synthesized texture and input tex-ture. Figure 4.2 illustrates our feature extraction process. Since structure is the most important feature we need, we implement a Gaussian filter to remove details and noise, which would affect the extraction of structural features. We use the Canny edge detection algorithm to find edges of the textures. To obtain the best extraction result, we also change the sigma and threshold values of the Canny edge detection algorithm for different textures. After detection, the edge regions will be denoted as 1 and the other regions as 0. Then we dilate the detected edges, so a little

4.1 Feature Extraction 19

(a) (b)

Figure 4.2: (a) is the original image of the two textures. Left one is the input texture, right one is the synthesis result from the input texture. (b) shows the edge images of (a). In (b), we calculate the difference of Rⁱ with Ij which is the most similar patch to Ri in the edge image of the input texture. We visualize these error values as a structural error map as shown in (c).

Brighter region means larger error. (d) is the gradient of structure error gained from computing the gradient of (c).

shift of the edges would not affect the structural error too much.

We measure the structural error of synthesized textures by comparing the differences be-tween the edges detected in the synthesized texture and the input texture. We acquire many image patches from a synthesized texture by uniformly sampling the synthesized texture. For a near-regular texture, the size of the patches is equal to the size of the tile [18]; For an irregular texture, the size of patch is manually set to be the average size of its texture elements. We denote R₁, R₂, ...Ri, ..., Rn as the patches of synthesized texture, respectively. These patches are

4.1 Feature Extraction 20

(a) (b)

Figure 4.3: (a) shows some patches on the edge image of synthesized texture. (b) shows some patches on the edge image of input texture.

10 pixels apart along the horizontal and vertical directions. The center of Riis larger than 0 and less than WR and HR. Here, WR and HR denote the width and height of the synthesized texture, respectively. Figure 4.3 (b) shows this sampling process.

The center of R₁is on the top-left corner of the image, and the next patch R₂is 10 pixels right to it. After we sampled all the patches in the first row, we will move 10 pixels downward to the next row, and continue the same process until we have sampled all the patches, Rn, in the texture.

To compute the structural error of each Ri, we compare it with the edge image of its input texture. For each Ri, we find the best matched patch in the edge image of the input texture. We denote these patches in the edge image as I₁, I₂, ..., Ij, ..., Im(with the same patch size of Ri). WI

and HI denote the width and height of the input texture. The center of Ijshould be in the range larger than 0 and less than WI and HI. Figure 4.3 (a) illustrates the patches in the edge image of the input texture. Note that the center of Ijis moved pixel by pixel in finding the best match process.

4.1 Feature Extraction 21

The center of Ii is on the top-left corner of the input texture, and we move 1 pixel to the right to get the next patch I₂. After we sampled all the patches in the first row, we move 1 pixel downward to continue the same process until all the patches are sampled in the input texture.

See Figure 4.3 (a)

For each Ri, the comparison is to calculate kRi − Ijk , ∀ Ij ∈ {I₁, I₂, ..., Im}, which simply calculated the difference of the value of Riand Ij, and the norm of (Ri− Ij).

We have to let Ri compare with every possible patch in the input texture to make sure that each Ridoes find the smallest structural error value, even if the patches are not entirely included in the input texture.

As the portion of patch Ij that is not inside the input texture will bias the computation of the structure difference kRi − Ijk, we multiply kRi − Ijk by a mask to mark the valid region.

Another purpose of the mask is to calculate the area of the valid region. We divide the structural difference by the area of the valid region of a patch. This avoids the area bias the computation of structural difference,

minj

S um(k(Ri− Ij)k ∗ Maskj)

area(Maskj) , j ∈ 1, 2, ..., m, (4.1) where S um(·) denotes the sum of all elements of a matrix, k · k is the absolute value, ∗ is the element-wise product of two matrices, and Mask mark the valid region. We record this value in a matrx as structural error map.

After we get the structural error map of each synthetic textures, we normalize them among all textures synthesized from the same input texture to form normalized matrix, E, shown as Figure 4.2(c).

4.1 Feature Extraction 22

Besides the structural error, gradient error along x-axis and y-axis dominates the viewer’s judgements. We get this feature by computing the gradient of E.

OE = (δE

In addition, human visual habbits affect the result. Hence, the information of the gaze position are considered as important features. We set two feature vectors, the first collects the vector of normalized structural error E and the position X, Y,

Φ =

The second feature vector consists of the gradient of structural error map G^x, G^y and the

在文檔中紋理合成貼圖的注視點與評分預測 (頁 18-28)