Transformation - Preprocessing Stage - 經由照片資料庫實現繪圖與卡通人像真實化之研究

Chapter 3. Preprocessing Stage

3.2 Transformation

With Equation (3-5), we input the 61 landmarks mentioned before as control points for Catmull-Rom Spline, Figure 3.2 shows as our result.

Like all automatic techniques, STASM and our forehead feature point finder algorithm may not always be as accurate as a human landmarker. It is still possible to get incorrect positions by the methods. Therefore, we provide user interaction in this step. Users can adjust the feature points to more accurate positions.

3.2 Transformation

Perspective Projection Warping

Image warping and transformation is a way to rearranging the position of pixels of an

- 16 -

image by performing geometric transformation. This skill is often used in both image processing and computer graphics (e.g. texture mapping, image mosaicing). Warping a source image into a destination image requires a mapping function between source space (u,v) and destination space (x,y).

We define the source image and target image as I and I′. The corresponding points would be the extracted facial feature points by above mentioned methods. A feature point in I defined as pi=(ui,

v

i)^T and the desired new position in destination image qi=(xi,

y

i)^T, i=1,…,n.

The goal of image warping is to find a transform function subject to f(p_i)= q_i. For each pixel i,

I′(f(p))= I(p). I(p

i) and I′(qi) is the intensities or colors of the images.

Here we use projective mapping, also known as the perspective or homogeneous transformation. It projects points from one plane onto another plane:

 

Each point correspondence generates two equations:

A projective mapping has 8 degrees of freedom which need at least 4 correspondences to solve. After rearrange equations above, this can be rewritten as:

- 17 -

the smallest singular value.

The reason why we choose to use projective warping instead of the others is described below. In our case, we need to warp one face to the other, so it will confront the perspective distortion problems due to different viewpoints. Other warping method may only be applicable to view angel difference up to 15 degrees.

After derived the mapping function from the source image to the destination image, we need to warp each point on source image to destination image. If we use forward mapping, each point in source image after transformation has a real-valued coordinates. It needs to round off to integer coordinates, but it may have error and misalignment.

Therefore, we find the inverse of mapping matrix which maps from destination image to source image. Then, instead of using forward mapping, we use backward mapping or named inverse mapping. Every point in destination image is filled with the intensity from source image. Backward mapping prevent from round-off error and the misalignment problem.

- 18 -

Local Warping

Even though perspective warping solved the problem of view point change, it does not guarantee the perfectly-aligned correspondence. Therefore, we try to solve this problem by applying local warping in the following step.

Each pixel q is transform to pixel p by the projective mapping function above mentioned, which is p=T(q). For more accurate alignment, we find the new position p′ by a local warping function f. The new adjusted position by local warping can be calculated by:

)

( p f

p 

(3-9)

There are m pairs of corresponding feature points. Feature point s_j from the source image is corresponding to feature point dj from the destination image to form one pair of corresponding feature points. If the projective mapping function is accurate, d_j=T(s_j), point s_j is supposed to transform to point dj. However, with the projective mapping function, dj′=T(sj)

d

j′dj, point sj is warped to point dj′ instead of point dj.

The last term in Equation (3-10) is the translation between dj and dj′. The c in Equation (3-11) is a small constant to avoid zero and η_j is the weight for translation. Equation (3-12) is for normalization.

- 19 -

The coordinate for pixel p′ may not be exactly on the grid, so directly sampling the color of nearby pixel is not accurate enough. Therefore, we use the adjacent left, right, up and down pixels th to interpolate mapped color. We derived the color of p′ by the RGB values of adjacent four pixels. The weight



_h is calculated with the distance between each nearby pixels. Equation (3-15) is for normalization.





In Figure 3.3, image (a) is the destination image and image (b) is the source image, we warp image (b) onto image (a) with the facial feature points. Figure 3.3 (c) showed that the result of perspective projection warping is visual satisfactory. However the corresponding positions are not exactly aligned. It can be seen from the shape and position of eyebrows, eyes, and mouth. Local warping adjusted image (c) slightly and the correspondence are now in precisely aligned.

- 20 -

(a) (b)

Figure 3.3 (a) The target image (b) The source image

(c) The result of perspective projection warping (d) The result after local warping

- 21 -

Chapter 4 Realizing Cartoon and Painting Faces

Aligning the database photographs to the input face, we are now able to use graph-cut-based multi-label optimization [14] to find the best matched patches in arbitrary shapes. The best match goal in our method is to find the highest similarity in colors and edge, and keep locally spatial affinity. Once the most similar face is pieced up from database, we present two procedures to remove visible discontinuities between these patches on face. Gain compensation, tuning the intensity of each patch, is used to lessen the color differences between adjacent patches. At last, we separate the patch boundaries into multi-level and blend each channel separately to generate realistic and novel faces.

4.1 Graph-cut-based Multi-label Optimization

Graph-cut is a method that finds an optimal node partition. It is equivalent to max-flow min-cut. In computer vision, graph cut methods are often used to tackle optimization of discrete elements, like pixels. For instance, graph-cut can segment the image into the background and foreground regions.

Graph-cut regards node partition as a labeling problem. For example, in image segmentation, it labels the foreground pixels as one and the background as zero. That is a two-label problem. To find the best partition for an input image, graph cut solve the problem by maximizing the flow according to edge capacities.

- 22 -

When we assign a label to each pixel, labels should have spatial affinities while preserving possible sharp discontinuities. These tasks are stated as energy minimization.

Solving global minimization of energy function is NP-hard problem. Therefore, Boykov et al.

[14] proposed alpha-expansion, which is an efficient approximation algorithm.

Later, more papers about applying multi-label optimization were presented. However, most of them were applied for segmenting different regions on an image. The multi-labels in our work represent the region indices of image pixels. The patch can therefore be in arbitrary shapes.

We have various requirements about portrait realization. We wish each pixel on input image comes from the corresponding position of the most similar photograph in database as source. Besides, we also require the synthesized face should be similar to input image and also realistic. Therefore, multi-label optimization matches our needs. The labels in our situation refer to the photograph ID in database.

Figure 4.1 Matching multi-labels to our problem

- 23 -

As Figure 4.1 shown, the labels in our cases are presented as each image index in database. For each pixel in the input cartoon or painting image, it can match the pixels at the same aligned position of all photos in database. Corresponding pixels refers to the pixel that has the same coordinate as the input image pixel after aligning with feature points. Each input image pixel will pick one label (photo ID). Which label the pixel picked means which source of image index it want to come from. Below, we are going to discuss about how the pixels pick the most adequate labels.

It is obvious that we should compare the corresponding pixel colors to determine which label to choose. The greedy method is to find the pixel labels with minimum color difference.

However, if we only compare the pixel color differences, the result will be almost the same as original input image. It lost the realism of faces and creates the discontinuity in faces, so that is not fit for our problem.

In order to preserve the realism of face, we expect the extracted labels should carry local and reality details from real faces. Therefore, the individual pixel labels would gather into patches. We consult the labels and color differences of the adjacent pixels. If we only determine the label by the color difference of its own, we can see an example from Figure 4.2.

Node p would choose the most similar label 3 as its label, and the neighbors choose label 2.

However, label 2 is also node p’s secondary similar label. So in this case we want node p to choose label 2 to follow its neighbors, and become a group with the identical label, called patch. This approach preserves the realism and smoothness of face. Therefore, for our energy function, we would consult the terms of color difference by each pixel and also the neighbor affinity of labels according to color differences between neighbors.

- 24 -

Figure 4.2 Node p and 4-neighbors

TABLE I

GRAPH-CUT-BASED MULTI-LABEL OPTIMIZATION SCHEME 1. Perform Canny edge detection for input image and each database images.

2. Perform Gaussian filter to propagate the range of edge.

3. Set up the neighborhoods for smooth term and symmetry term.

4. Iteratively solve our energy optimization function by alpha expansion with step 5.

5. Compute data term, smooth term, symmetry term and edge term penalty.

We formulate our task as a labeling problem of Markov Random Fields (MRF). The input image I is represented by a weighted graph G = (P, N). Each pixel stands for one node

pP in the graph G. The pairwise adjacent pixels are linked by edge p,q N. 4-neighbors

are used here. Next, the MRF can be formulated as the following energy function:

 

graph-cut-based multi-label optimization method is briefly described in Table I.

- 25 -

Data term

The first term in Equation (4-1) is known as the data term, as described in Figure 4.3. As mentioned above, each pixel of the input image choose a belonging label by comparing the color difference with the corresponding pixels. For location p, we calculate the difference between the intensity of input image and database photograph with label f. As described before, the label in our cases represents the index of database photograph. Data term ensures that pixel p in label

f

photograph is similar to pixel

p

in input image. It gives a large penalty when pixel p in label

f

photograph is too different to the observed pixel

p

. The data term Dp in Equation (4-1) can be defined as: normalizes the color difference range to zero to one.

Data term penalty prevents each pixel from matching with a very dissimilar pixel from photograph in database. The more dissimilar pixel is labeled the more penalties it will get.

Therefore, it can keep the similarity of the result and input image. However, keeping intensity makes the image fragment into discontinuous and separate crumbs from multitudinous sources. It lowers the reality of result faces.

- 26 -

Figure 4.3 The illustration for data term and smooth term

Smooth term

The smoothness in Equation (4-1) is the second penalty term that can retain neighbor affinity. As showed in Figure 4.2. It is based on 4-connected neighborhood system. The purpose of the smooth term purpose is to keep the local labeling smooth, so it penalizes two neighbor pixels p and q if their colors are similar in input image but is assigned with different labels.

f

_p and

f

_q represent the assigned labels to pixel p and q.

The smooth penalty is defined as follows:

 

the same. The second rule defines the energy between adjacent nodes is mono-combination.

The last rule form to the triangular inequality, where the energy for a shortcut is always lesser than taking the indirect path.

The smooth term of our energy function is defined below:

- 27 -

cartoon image. The constant w is for range normalization.

Figure 4.3 illustrate the smooth term. If node p and node q have the same label, which mean they will get intensity data from the same source, the smoothness between them is conceivable continuous. So we give no penalty for this pair of nodes. When node p chooses different label as its neighbor q, we give punishment for discontinuity by Equation (4-4). If the color difference of pixel

I ( p )

fp and pixel

I ( q )

fq is large then the penalty for node p and node q is large for the extensive discontinuity.

Symmetry term

With the data term and smooth term, we can get satisfactory results in most cases.

However, human eyes are sensitive to faces, especially to symmetry of human appearance. In certain cases, the results might not be satisfying for users due to asymmetric eyes or others.

Therefore, we want to adjust the problem of symmetry through the third kind of penalty, called symmetry term.

Huang et al. [18] proposed an idea in the paper “RepSnapping”. They add a new term to the graph-cut energy function, other than data term and smooth term. A new term, repetition term is added to the energy function. This term link more neighbors in the RGB color space, while these linked nodes might be far away in spatial relationship. With this term,

- 28 -

“RepSnapping” is able to segment similar objects in an image even if the objects are spatially far from each other.

Figure 4.4 Symmetry Neighborhoods

Figure 4.5 The illustration for symmetry tem

On the other hand, we propose a symmetry term to our energy function. It is used to connect symmetry facial features even if they are separated. For instance, we can make new links between left eye and right eye, and also link left eyebrow and right eyebrow nodes to

The symmetry term of our energy function is defined below:

 

- 29 -



is a trade-off weight. The constant w is for range normalization.

Figure 4.5 shows the link for symmetry term on the graph. For pairs of symmetry node, node p and node r, since it comes from the same person, we believe if they have the same label that the eyes is symmetrical. So we give no penalty in this case. If they have different label, we give some penalty for choosing from different source that may cause asymmetry. If the pair of nodes chooses from different source but they have similar color then the penalty is less. However, if the pair of nodes chooses different sources and has large color difference, the penalty is large for causing a bad result.

Edge Penalty

Moreover, another new term, edge penalty, is also added to our energy function. We merge this edge penalty into the data term. The originally data term is comparing only with color differences, shown in Equation (4-2).

Figure 4.6 The overview for preprocessing the edge information

Figure 4.7 Gaussian filter to propagate the range of edge

- 30 -

The purpose of this edge penalty is to deal with the discontinuous edges on synthesized face from multiple sources. As we mentioned before, Visio-lization algorithm [4] use fixed size and shape patches, the edges on face may not be continuous across patches. We solve this problem by using arbitrary size and shape of our patches on faces. The arbitrary patches solve most of the discontinuous edge problem. However, using the edge penalty can perform even better.

We use Canny edge detector to find the edges in the input cartoon image and all the database photographs, shown in Figure 4.6. Pixels with edge crossing over will have values equals to one, other pixels will have zero. Next, we use Gaussian Filter to propagate the range of edge. After filtering, the value of pixels near edge will be closer to one and farther approach to zero. This can be seen in Figure 4.7. The middle of the edge has the highest value and progressively down to zero. Propagating the range of edge is for increasing the edge region because the database image edges may not be exactly on the edge of input image.

Besides, the gradual change of edge intensities can help the optimization process progressively move the parameters during multiple iterations. The new data term after adding edge penalty becomes:



is a weight, which decides the proportion of edge penalty in energy function. With variable size and shape patches we can prevent the face from discontinuity in most of the cases.

We use Figure 4.8 to describe several different cases of edge penalty, where the three pixels are all assigned to the same label K. Location p of the three cases are denoted as p₁ to p3 in the Figure. The white pixel has the highest value one which is the strongest edge and

- 31 -

black pixel has the lowest value zero which is far from the edge. The gray pixels have the blending value, such as 0.49 and 0.85. For the first case, p₁ in input image is on the edge center which has edge value one and it chooses label K. In the image K, p1 is also on the edge center which returns no penalty. It is a good match for matching edge center to edge center.

Location p2 is in the center of the edge matched to the close surrounding of edge. Although it does not match exactly to the center of the edge but it has small probability that it will cause discontinuous for the result, so it gets a low penalty. Case 3 describes an edge pixel matching to a pixel with no edge crossing and far away from the edge, which had a very big chance of causing discontinuity. Therefore, case 3 gets a large penalty.

Figure 4.9 (a) is the result without edge term. The lower lip did not match the edge across patches well, so the lip line is not smooth enough. After we add the edge term to our energy function, it preserves lip line well and smooth such as Figure 4.9 (b). Figure 4.9 (c) shows under different variable setting, even if here are different patches connecting it can still get a satisfying results.

Figure 4.8 Cases of edge term penalty

(a) (b) (c) (d)

Figure 4.9 (a) The cartoon mouth (b)(c) Compared results between without edge term and

with edge term (d) Other result with edge term

- 32 -

4.2 Seamless Patch Stitching Stage

In seamless patch stitching stage, we want to solve the problem of the intensity gaps between adjacent patches from different sources. Every photograph has different color tones, so the patches we retrieve have intensity gaps between them. Therefore, we use gain compensation and multi-level blending to seamlessly stitch the patches.

Chromatic Gain Compensation

Gain compensation is proposed by Brown et al. [6]. This concept is originally presented for stitching panorama. With the overlap regions between all the patches, gain compensation narrows down the color differences between all the adjacent patches. In our work, we adjust for arbitrary shape patch stitching.

Panorama use images taken in a close and short time, so the color difference between each images are only slightly different. Therefore, the originally gain compensation used only one gain variable for each stitching patch. However, the stitching patches in our cases come from different images and for different sources the colors differ much. A single gain variable is not sufficient in our cases to narrow down the color differences between adjacent patches.

Therefore, we use three gain variables for each patch and each gain is for each color channel.

The method of chromatic gain compensation is briefly described in Table II. First, for each patch, we expand the patch region to form overlap regions with every adjacent patches.

Here we define i and j to be the indices of patches. The pairwise adjacent patches are defined as

i,j  H. H is a set including all the pairwise patches with overlap region. Each patch i

comes from database image index

f

ⁱ.

I

ⁱ_fi

 I

_f^jj is the overlap region of patch i and patch j.

If the overlap regions are too small, the result may be easily affected by the outliers like noise.

- 33 -

TABLE II

SEAMLESS PATCH STITCHING SCHEME (PART I) 1. Apply chromatic gain compensation

a. Expand each patch region to form overlap regions with every adjacent patches.

b. Find the dominate RGB cluster center for each patch on both input image and piece-up image.

在文檔中經由照片資料庫實現繪圖與卡通人像真實化之研究 (頁 26-0)