Batch Sequencing for Run-to-Run Control: Application to Chemical Mechanical Polishing

(1)

Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036

Yih-Hang Chen, An-Jhih Su, Sheng-Jyh Shiu, Cheng-Ching Yu, and Shih-Haur Shen

Ind. Eng. Chem. Res., 2005, 44 (13), 4676-4686 • DOI: 10.1021/ie0491544 Downloaded from http://pubs.acs.org on November 28, 2008

Batch Sequencing for Run-to-Run Control: Application to Chemical

Mechanical Polishing

Yih-Hang Chen,†_{An-Jhih Su,}†_{Sheng-Jyh Shiu,}†_{Cheng-Ching Yu,*},†_and Shih-Haur Shen‡

Department of Chemical Engineering, National Taiwan University, Taipei 106-17, Taiwan, and Applied Materials Taiwan Ltd., Hsin-Chu 300, Taiwan

This work compliments and extends the capability of the run-to-run (R2R) control by sequencing the incomings such that improved control performance can be achieved. Unlike chemical or mechanical systems, this is important for semiconductor manufacturing processes because some prior information about the incoming wafers is generally available. First, the limitation of a R2R control type of feedback system is explained. The frequency domain explanation is as follows: a negative feedback system is effective rejecting a low-frequency type of disturbance. From the feedback property, then the answer to the feed sequencing problem becomes clear: arrange the feed (which we are capable of doing) in such a way that it gives a low-frequency characteristic. Furthermore, a scalar load strength indicator is derived, and an engineering approach is taken to sort the incomings effectively. Two policies, from thin to thick (in terms of prethickness, policy L2R) and from thick to thin (policy R2L), are proposed, and they are shown to provide a better control performance over the conventional random feed policy (R). The feed sequencing problems are tested for systems with different dimensions, e.g., single input single output, single input multiple output, and multiple input multiple output systems, which include the model of an experimental chemical mechanical polishing process, and the results show that an improved control performance can be obtained simply by arranging the feeds.

1. Introduction

The general meaning of run-to-run (R2R) control is to implement control action between runs (or from batch to batch) of repetitive processes. Typically, R2R control is applied to processes such as robot systems, batch processes, and semiconductor manufacturing processes. This kind of control strategy uses the information generated from the previous run (e.g., error of output) to correct the process input such that the output at the next run will meet the product specification. In recent years, R2R control has received considerable attention in control of semiconductor manufacturing processes because of its relatively inexpensive cost of control as compared to the loss of unqualified products. As pointed out by Sachs and co-workers,1,2_{R2R control combines}

the advantages of both statistical process control (SPC) and feedback control to overcome shifts and drifts during process operation. Typically, the disturbance was characterized statistically, followed by the design of the feedback controller. Many R2R control algorithms have been developed for different disturbance characteristics. They include the following: exponentially weighted moving average (EWMA)1,2 _{and double-EWMA}

algo-rithms,3_{age-based double EWMA,}4_{generic cell}

control-ler,5_{optimizing adaptive quality controller,}6_self-tuning

controller,7 _{and a Hammerstein model based}

general-ized R2R control algorithm.8_{Applications of R2R control}

are chemical mechanical polishing (CMP), chemical vapor deposition, plasma etching, and photolitho-graphy.2-6,9,10_{Moyne et al.}11_{give an updated summary}

in R2R control up to 2000, and Del Castillo12_provides

comprehensive references on SPC and R2R control.

In Patel et al.,9_{R2R control of CMP of dielectric films,}

the following sources of process variations are clearly identified: (i) tool-induced, (ii) product-induced, and (iii) incoming disturbances. A two-tier approach was taken9

where, on the regulatory level, the local controllers compensate for tool-induced and incoming disturbances, while on the supervisory level the product-induced disturbance was rejected. Therefore, it is obvious that, unlike typical R2R control of chemical or robotic sys-tems, some prior knowledge about the quality of the incoming wafers is available in semiconductor manu-facturing processes. Instead of performance of R2R control on a preset feeding route, it is possible to arrange the feeding sequence such that a better control per-formance can be achieved. This is the objective if this work.

The remainder of this paper is organized as follows. In section 2, the strengths and limitations of a feed-back system are explored and the feed sequencing problem is formulated mathematically. An engineer-ing approach is taken to sequence the feed such that it represents a low-frequency type of disturbance into the feedback system. The effects of tuning, model mismatches, and time-varying parameters are explored, and the proposed approach is extended to multivari-able systems [e.g., single input single output (SISO), single input multiple output (SIMO), and multiple in-put multiple outin-put (MIMO) systems]. In section 3, a multizone CMP model is constructed from experiments and used to test the effectiveness of the proposed feeding policies via simulation. All simulations are carried out using MATLAB/Simulink. A lot of the 24 wafers with surface profiles generated from the experi-ment are used as the simulated feeds. Again, a signifi-cant improvement in the control performance is ob-served.

* To whom correspondence should be addressed. Fax: +886-2-3366-3037. E-mail: ccyu@ntu.edu.tw.

†_{National Taiwan University.} ‡_{Applied Materials Taiwan Ltd.}

(3)

2. Feeding Sequence and R2R Control

2.1. Concept. 2.1.1. Feedback and Its Limitation. In this section, the basic ideas behind R2R control and the capability of such a feedback structure are dis-cussed. As pointed out by Sachs et al.,2_{R2R control is}

carried out via a feedback structure while being pro-vided with statistical interpretations. On the imple-mentation level, R2R control can be viewed as a supervisory control, which resets the set point of local controllers.9 _{Let us use the CMP process to illustrate}

the control architecture. The back pressure (p) and the platen speed (v) are controlled via local controllers. After a polishing run (possibly with a metrology delay if an ex situ measurement is employed), the error (e; i.e., the difference between the post-thickness and the target) can be used to update the down force or the platen speed. Typically, an I, PI, or PI2_(integral,

proportional-integral, or proportional-plus-double-integral) controller is employed for the set-point adjustment. Figure 1 illustrates the control structure for R2R control and, in a CMP notation, y represents the post-thickness, yset

denotes the target, u, for example, is the backpressure, and uset_{is the set point of u. Because the dynamics of}

the local control loop is much faster than the process time in each run (i.e., batch time), the local controller is treated as a part of the process. Therefore, R2R control can be visualized as a feedback system and, in the control notation, the process is denoted as G and the controller is K.

From the control perspective, R2R control is a discrete-time system; the z-transformed transfer functions are generally employed as shown in the block diagram of the feedback system (Figure 2A). Figure 2A describes a typical feedback system with output y, input u, set point (or target) yset_{, load disturbance d, a delay element (z}-1₎

as a result of the measurement taken from the previous run, process G, and controller K. In semiconductor manufacturing, the process is typically taken as a steady-state gain (i.e., without a dynamic element).11,12

Actually, this is a reasonable assumption because the manipulated inputs respond almost instantaneously to set-point changes in the local loop and the speed of the response is much faster as compared to the dynamics of the processing time. For a feedback system depicted in Figure 2A, the sensitivity function S describes the return difference between an input signal and the resultant output signal. For example, the relationship between y and d can be expressed in terms of the sensitivity function:

Consider a typical transfer function for CMP processes with G(z) ) Kp) 4000 and an integral controller with

K(z) ) Kc/(1 - z-1) and corresponding setting Kc )

1/(10Kp). Figure 2B shows that the magnitude of the

sensitivity function is less than unity at low frequency and reaches asymptotically toward 1 at high frequency. That implies that feedback control is effective in

at-tenuating a low-frequency disturbance but can do little for a high-frequency one. Also note in Figure 2 that ωN

stands for the normalized frequency (i.e., ωN) ωTs/π,

where Tsis the sampling time). Despite the fact that

the bandwidth (e.g., the corner frequency and∼0.1 in Figure 2B) may vary with the controller parameter, the shape of the sensitivity function clearly indicates the inherent limitation of a feedback system. In terms of R2R control (a feedback system), this means that control will be effective if the load disturbance is of a low-frequency nature and will provide little disturbance rejection capability if the load change is of a high-frequency one. Fortunately, the process shifts (e.g., a step change) and drifts (e.g., a ramp down) belong to a low-frequency type of disturbance, as can be understood from the frequency responses of these signals. Therefore, R2R control is expected to perform well.11,12

2.1.2. Problem Formulation. As mentioned earlier, in semiconductor manufacturing processes, we have some prior knowledge about the quality of incoming wafers (e.g., the thickness from prepolish measurement, which is denoted as the prethickness hereafter). The question then becomes, is it possible to achieve improved R2R control by rearranging the incomings? Let us use the prethickness variation in a CMP process as an example. Suppose, in the temporal mode, the average thickness of k incoming wafers is yavg(e.g., 8000 Å), and

we intend to remove all of the copper using a fixed removal rate (e.g., 4000 Å/min) provided with a given polish time (e.g., 2 min). For the sake of clarity, let us assume that R2R control is used to adjust the polish time according to deviation from the target. From a linear theory, the incomings (yt) that deviate from the average prethickness (yavg) can be treated as incoming

disturbances (dt). That is

where ytis the tth feed and dtis the corresponding load disturbance. With the prethickness available, we can define a nominal feeding sequence as

Figure 1. Block diagram for typical R2R control.

S≡ 1

1 + GK) yd (1)

Figure 2. Block diagram for a feedback system (A) and

corre-sponding sensitivity function (B).

y_t) y_avg+ d_t (2)

(4)

where the superscript T denotes transpose. Note that the vector d0 defines the feed sequence in a temporal

mode with a total of k incomings. Equations 2 and 3 indicate that the incomings are arranged into the sequence y1, y2, ..., yk with corresponding load distur-bances d1, d2, ..., dk. Any rearrangement of the feeding sequence can be expressed in terms of the permutation matrix Pj14

where Pj(∈×c2k×k) is a permutation matrix (with single unity in each row and column zero elsewhere). For a system with k incomings, we have “k!” possible feeding sequences (i.e., k! permutation matrices for a k × k system14_{). It is clear that the number of possible feed}

arrangements grows exponentially and, for a lot of the 24 wafers, the number of possible feeding sequences is 24! ) 6.2045× 1023_.

To evaluate the control performance, a performance measure is defined first. A 2-norm-based objective function is employed here. It can be expressed in either the time domain or the frequency domain:13

The frequency domain approach is taken to facilitate analysis. From the block diagram in Figure 2A, the relationship between the control error and load distur-bance can be expressed as

Note that d in eq 6 represents a specific sequence of dt’s that shapes the frequency content of a load distur-bance. Here, d is defined by a permutation matrix Pj (eq 4), and for notational convenience, the subscript j is dropped hereafter. In other words, from the time domain perspective, Pjdescribes the shape of load disturbances, e.g., step, ramp up, ramp down, random, etc. With the definitions of all possible feed sequences and perfor-mance measurement, the optimal feed sequencing prob-lem can be formulated as an optimization probprob-lem. That is, find a permutation matrix such that the 2-norm of control error can be minimized. Mathematically, we have

However, for a system with k incoming wafers, the number of search spaces is “k!”, and it becomes imprac-tical to solve the optimization problem by evaluating the integrands for k > 10. Some engineering insight is needed for the feed sequencing problem.

2.1.3. Heuristic Approach. As mentioned earlier, R2R control, a feedback system, is effective in eliminat-ing low-frequency disturbances (Figure 2B) and, there-fore, a better R2R control performance can be obtained if the incomings can be arranged in such a manner that the feeding sequence shows a low-frequency character-istic. Let us use a generic system to illustrate the effect of the feeding sequence. Consider a CMP example where the prethickness of the incomings follows a normal distribution

where the subscript t is the temporal index that denotes the tth feed and tis a normally distributed error with zero mean and variance σ2_{. That implies (eq 2) that}

On the basis of engineering feasibility, three feeding policies are devised. The first one is to arrange the feeds from the left side of a normal distribution to the right (left to right according to the distribution as shown in Figure 3), which is denoted as policy L2R, the second one is from right to left (denoted as policy R2L), and the third one is to arrange the feed randomly (denoted as the policy R, as shown in Figure 3). In terms of the prethickness in the CMP, the policy L2R means from thin to thick, policy R2L implies from thick to thin, and policy R stands for doing nothing. One may devise much more sophisticated feeding policies, but it should be emphasized that these three policies are relatively easy to implement. The next question then becomes, what are the frequency contents of these feeding policies? The power spectrum density (PSD) is often used to evaluate the frequency content of a time series. Let us take 500 data from a normal distribution. The first column of Figure 4 shows the feeding sequences for these three policies, where policies L2R and R2L indicate smooth changes among runs while policy R shows a typical random behavior of the incomings. After Fourier trans-formation, the PSDs in Figure 4 clearly reveal that policies L2R and R2L indicate low-frequency charac-teristics while policy R displays the same strength over the entire frequency range (Figure 4). From the analyses of the frequency content of load disturbances (Figure 4) and the effective frequency range of a feedback system (Figure 2B), it is expected that policies L2R and R2L will give much a better control performance than policy R. Note that, by a control performance, we mean the 2-norm of error using the same control algorithm with the same tuning parameters except for different feeding sequences.

2.2. SISO Systems. The following linear SISO ex-ample mimics R2R control of the prethickness variation in a CMP process. For 500 incoming wafers, the average copper thickness to be removed is yavg) 8000 Å with a

fixed removal rate RR ) 4000 Å/min [G(z) ) Kp) 4000],

and this corresponds to a polish time of 2 min (u ) 2). Without R2R control, the polishing process will be carried out accordingly and, obviously, errors will result if the prethicknesses are different from the average one. This is referred to as the open-loop error (eOL). If the

prethickness of the incomings follows a normal distribu-tion (eqs 8 and 9), the square of the open-loop error is simply the variance of the incoming thickness variation (i.e., ||eOL||2

2 _{) σ}2_{). In the context of feedback control,}

system variables, e.g., y and u, typically are expressed in terms of deviation variables (i.e., deviation from the nominal steady state). Thus, R2R control of the simple Figure 3. Three feeding policies based on a normal distribution:

L2R (thin to thick), R2L (thick to thin), and R (random).

dt) t, t∼ N(0,σ 2 ) (9) dj T_{) d} 0 T Pj (4) ||e||2 2₎

∫

0 ∞ e(t)2dt )1 π

∫

0 ∞ |e(iω)|2 dω (5) e(iω) ) d(iω) 1 + GK(iω) (6) min Pj ||e||2 2_{) min}1 π Pj

∫

0 ∞ | d(iω) 1 + GK(iω)| 2 dω (7) y_t) y_avg+ _t, _t∼ N(0,σ2) (8)

(5)

CMP process is translated into a feedback system with a static process transfer function [G(z) ) 4000], a unit delay in the feedback loop (z-1), and a controller K (Figure 2A). For each wafer, the deviation from the average thickness is treated as load disturbance di(eqs 2, 8, and 9) and the sequence of the load change (d) is defined by the permutation matrix Pj (eq 4). For the gain process plus unit delay, if the zeroth-order Pade approximation is employed, the internal model control principle gives an integral-only controller. That is

with a tuning constant Kc, where here Kcis set to1/6of

Kp. In the time domain expression, the control action is

computed according to

where the subscript t is the time index. Consider the case of 500 incoming wafers and difollowing a normal distribution with a variance of 1. Let us use this simple example to illustrate the importance of the feeding sequence. All simulations are carried out using MAT-LAB/Simulink. A comparison is made between the open-loop error and closed-open-loop error (under R2R feedback control) for three feeding policies.

For policy R, R2R offers little improvement over the open-loop case, as shown in Figure 5. The error ratio (the closed-loop error over the open-loop error, ||eCL||2

2_/_||e OL||2

2_{) for the random feed policy is close to 1,}

and practically no improvement is made for feedback control. This is within one’s expectation because policy R excites the entire frequency range uniformly (Figure 5). On the other hand, policies L2R and R2L provide significant improvement over the open-loop case, as can be seen from the error ratios of 4-5%, a 95% improve-Figure 4. Three feeding policies and corresponding PSDs for the SISO example with 500 runs generated from a normal distribution.

K(z) ) K_c/(1 - z-1) (10)

(6)

ment! The results also coincide with one’s intuition, as shown in the PSDs of these two feeding sequences. The behavior of the manipulated variables (u) for the L2R and R2L cases may raise some concern for possible instability (Figure 5). However, as one compares the shape of the load disturbance with that of the manipu-lated variable in Figure 5, it becomes clear that the manipulated variable counteracts with the load change and, unless the load change becomes unstable (e.g., the prethickness grows to infinity), the feedback system will remain stable.

One may ask whether the difference in the nominal performance is the result of controller tuning. Again, this can be explained from the PSD of the feed se-quences. R2R control gives virtually no performance im-provement for policy R as a result of the inherent limit-ation imposed on the feedback system. This situlimit-ation holds regardless of controller settings. As for the R2L policy (similarly for the L2R policy), the PSD is a monotonically decreasing function of frequency and, therefore, a tighter controller tuning (i.e., increasing Kc)

will result in a better control performance. The same result can be found when model mismatches arise.

2.3. Nonsquare Systems. Up to now, we have addressed the feed sequencing problem for SISO sys-tems. Unfortunately, most of the semiconductor manu-facturing processes are of the distributed parameter nature. Thus, we have a nonsquare multivariable system. Two possible cases are explored. One is the system with a single manipulated variable, a SIMO system, and the other is the MIMO system, systems with multiple manipulated variables.

2.3.1. SIMO Systems. Let us use a simple static example to illustrate the effects of the feeding sequence in R2R control. Consider a nonsquare linear process with m outputs and one input.

where y (∈Rm×1_{) is the process output with the entry}

yi, u (∈R1) is the process input, and Kp(∈Rm×1) consists

of the model parameters. Note that the spatial mode is discussed here. If eq 12 describes a Cu polishing process, yi’s can be viewed as the copper thickness from the center (y1) toward the edge (ym), Kpi’s denote removal

rates across the radial position, and u denotes the polish time. It should be emphasized here that, for a multi-variable system under feed sequencing, two modes (and corresponding indices) should be distinguished. One is the temporal mode with the subscript t (eq 2), and the other is the spatial mode with the subscript i (e.g., eq 12). Assuming that the tth incoming wafers can be described by

where yavg,i is the average thickness in the ith wafer

position (i ) 1, ..., m) for all incoming wafers (t ) 1, ..., k) and tiis a normally distributed error with zero mean and variances σi2, eq 13 describes a typical surface profile of an incoming wafer.

Figure 5. Control performance (in terms of the error ratio in 2-norm) of three feeding policies for the SISO example with 500 runs.

Figure 6. Block diagrams of the original feedback system (A) and

SVD-based design for a nonsquare multivariable system (B).

y ) K_pu (12) yt) yavg+ t)

[

y_avg,1+ _t1 l yavg,m+ tm

]

, ti∼ N(0,σi 2 ) (13)

(7)

Because we are more interested in the planarization of the wafer surface, variations across the radial position should also be treated as load disturbances for the control system to overcome. Thus, the average wafer surface across the radial position (in a spatial mode) is used. That is

where yj is the average wafer thickness in the temporal and spatial modes. Following eq 2, we have

The disturbance vector for the tth incoming can be

further divided into two parts:

Here, ds_{represents the disturbance in the spatial mode}

(will not change with the feeding sequence), and d_tt denotes the disturbance in the temporal mode. It then becomes clear that the feeds should be sequenced based on the temporal part of the load disturbance (d_tt) t) because the deviation in the spatial mode (e.g., yavg,i

-y

j) comes into the system as a steplike disturbance, a low-frequency type of disturbance, which can be taken care of by the R2R feedback controller. The appendix offers an alternative approach to validating that the feed should be sequenced according to the temporal part of the disturbance.

Figure 7. Control performance (in terms of the error ratio in 2-norm) of feed sequencing according to the load strength indicator (d*) of

two feeding policies for the simple 2× 1 example.

yj )

∑

i)1 m y_avg,i (14) yt) yj + dt)

[

yj l yj

]

+

[

(yavg,1- yj) + t1 l (y_avg,m- yj) + _tm

]

(15) d_t) ds+ d_tt) ds+ _t)

[

(y_avg,1- yj) l (yavg,m- yj)

]

+

[

_t1 l tm

]

(16)

(8)

With the temporal part of the load variable (d_tt) defined, how can we arrange the feed based on a vector? The block diagram in Figure 6B presents an attractive solution, and that is reducing the dimension to 1 according to the singular value decomposition (SVD). Thus, one obtains

Here d_t/is a scalar that can be used as an indicator of the strength of the load change in the temporal mode. Without loss of generality, consider a 2× 1 case of Kp1) Kp2) 4000, and tiis a normally distributed error with zero mean and a variances of 1. Let us further assume that yavg,1 ) yavg,2. Again, policies R and R2L

are used to illustrate the effects of the feed sequencing (based on d_t/) on the control performance. R2R control is designed using the SVD (Figure 6B), and the control-ler gain (Kc) is set to 1/6 of Kp. Figure 7 reveals that,

similar to the SISO case, the error ratio of policy R stays close to 1 while policy R2L gives almost 50% improve-ment over the uncontrolled case. It is observed that the margin of improvement, despite being significant, is much less than that of the SISO example (∼95% in Figure 5). The reason is that, for a nonsquare system, it is not possible to keep both outputs at set points with only one manipulated variable and a certain degree of control error will remain, regardless of the feed ar-rangement and/or complexity of the controllers. Figure 7 also shows that the load strength indicator (d_t/) indeed shows a low-frequency characteristic for policy R2L as opposed to the random behavior for policy R. This is exactly why the feedback mechanism of R2R control will be in effect and a 50% improvement can be made.

2.3.2. MIMO Systems. The ongoing analysis deals with a system with only single input, and the structure of the process (SIMO) leads to a scalar load strength indicator. However, in semiconductor manufacturing, systems with multiple manipulated inputs are some-times encountered (i.e., Kp∈ Rm×nwith m g n > 1),

how can we generate a load strength indicator in this case? Again, the SVD offers some light in this direction. Consider a general (square or nonsquare) multivariable

process Kp(∈Rm×nwith m g n) with the temporal part

of the load vector d_tt (∈Rm×1_{). Unlike the SIMO case,}

the product of UT_d t t

is an n× 1 vector:

Note that U is an orthonormal matrix where the strength along the principle direction is not explicitly taken into consideration and it is described by the singular values in the diagonal Σ matrix (∈Rn×n_).14,15

Taking the degree of stretching into consideration, we arrive at the following scalar load strength indicator:

where sum(‚) is the summation of all elements in the vector (‚). Note that d_t/ in eq 19 is the load strength indicator for a general m× n system and eq 16 is simply a special case for n ) 1 (note that the nonzero value of σ1will not affect the result of sorting).

With the load strength indicator (d_t/) available, the incoming wafers can be arranged in the following steps: S1. Find the average thickness across the radial position (yavg,i, i ) 1, ..., m) for all incomings (t ) 1, ...,

k) and corresponding temporal deviations from the average ones (d_tt) t) (eq 16).

S2. Transform the vector-valued tinto the scalar load strength indicator (d_t/) for the tth incoming feed (eq 19).

S3. Sequence the feed according to d_t/for all incom-ings (t ) 1, ..., k) using policy R2L or L2R, whichever has a smaller absolute value of d₁/(the first incoming is practically uncontrolled in the R2R framework).

These three steps complete the engineering feed sequencing for effective R2R control. Also note that for special cases, e.g., SISO systems, the step S2 can be bypassed and the procedure can be simplified signifi-cantly.

3. Nonsquare CMP Example

3.1. Multizone CMP. In recent years, CMP has become an important technology because there exists the requirement of global wafer planarity for multilevel interconnect devices in integrated circuit fabrication. CMP also gives the advantages of defect reduction, wide windows for etching and lithography, and yield im-provement.16-18_{Despite recent advances in CMP, some}

manufacturing concerns associated with successful imple-mentation of CMP remain to be overcome.16-23 _In

theory, CMP can achieve global planarity, but the problem of within wafer nonuniformity (WIWNU) still remains as one of the major operation concerns. WI-WNU indicates the variation in the surface thickness across the wafer radial position, especially on the edge. Besides, in the surface profile of wafers produced from an electrochemical plating (ECP) process, it appears that the metallic layer is thicker on the edge area. Thus, a new type of CMP, multizone CMP, is implemented.24

Multizone CMP is expected to reduce WIWNU and to achieve a wider process window. Unlike the typical single-zone configuration, the wafer carrier is divided Figure 8. Schematics of a three-zone CMP.

d_t/) UTd_tt) UT_t) [u11 ... u1m]

[

_t1 l tm

]

)

∑

i)1 m u_1i_ti (17) UTd_tt) UT_t)

[

u₁₁_t1+ u₂₁_t2+ ... + u_m1_tm l u1nt1+ u2nt2+ ... + umntm

]

(18) dt / ) sum(ΣUT dt t ) )

∑

j)1 n σj

∑

i)1 m uijti (19)

(9)

into three zones in the radial position and different pressures can be applied to each zone (Figure 8).

For a wafer carrier with a radius of 150 mm, the first zone covers 0-130 mm, the second zone ranges from 130 to 140 mm, and the third zone covers 140 mm and beyond. Next, the well-known Preston’s equation25 _is

used to model the polishing process. It models the material removal as a linear function of pressure and rotation speed.

where K is the Preston constant, p is pressure, and v is the rotation speed. The Preston equation can be ex-tended to multizone CMP in a straightforward manner. For the multizone system, the relationship for the removal rate at the ith radial position can be expressed as

where Kij is the local Preston constant describing the effect of pressure from the jth zone on the ith radial position, pjdenotes the pressure of the jth zone, and v is again the rotation speed. Because the rotation speed is fixed throughout all runs, v is absorbed into Kp,ijfor

subsequent development.

The next task is to find all of the constants to describe the multizone CMP system. In the modeling phase, two two-level factorial designs are carried out on all three pressures (20 combinations). For the Applied Materials’ Mirra polisher using a 300-mm Titan Profiler head, the copper thickness across the radial position is measured. The measurement is converted to 59 data points along the radial position. Then, the approximated removal rate at each radial position can be obtained by dividing the amount of copper removal with the total polish time, which is set at 59 s in each test. In a control notation, the multizone polishing model can be written as Figure 9. Columns of the steady-state gain matrix Kp(A) and columns of the ouput orthonormal matrix U (B) for the three-zone CMP

example. RR ) K‚p‚v (20) RRi)

∑

j Kijpjv )

∑

j Kp,ijpj (21) _{y ) K} pu (22)

(10)

where y is the removal rate vector (∈R59×1_{), K}_p_{is a}

steady-state gain matrix (∈R59×3_{) describing local}

Pre-ston constants, and u is a vector of input pressures (∈R3×1_{). Given experimental data, the K}_p_{matrix was}

simply determined using the least-squares regression. The regression result versus wafer radial position is plotted in Figure 9. The effects of input pressures on each zone appear to be quite reasonable. The pressure applied on zone 1, u1, shows a significant effect on the

removal rate in the range of 0-110 mm, and the influence degrades gradually toward the wafer edge as

can be seen from the numerical values of the first column of Kp(i.e., Kp1). The pressure applied on the

second zone, u2, shows a large value of the Preston

constant around 135 mm, and its influence on the removal rate diminishes toward both ends as shown in Kp2. As expected, the input pressure of the third zone

shows little effect on the removal rate until 135 mm, and then the influence grows linearly toward to the edge.

Next, we perform the SVD on the Kpmatrix. This

leads to a 59× 3 U matrix, a 3 × 3 VT_{matrix, and a 3} Figure 10. Surface profiles of 24 wafers (a lot) after ECP and right before CMP.

Figure 11. Control performance (in terms of the error ratio in 2-norm) of feed sequencing according to the load strength indicator (d*)

(11)

experimentally obtained surface profiles of 24 wafers (a lot) are used for R2R control, as shown in Figure 10. These surface profiles were obtained from Cu deposition via the ECP process, which is carried out with Applied Materials’ iECP and the copper thickness varying from 8500 to 12 000 Å. All copper thickness measurements were made with the i-Scan sensor. In the following, a simulation study is carried out to evaluate the R2R control performance using the profiles in Figure 10.

3.2. Control. Multivariable R2R control is explored. This corresponds to a 59 × 3 multivariable control problem. Following the SVD, the controller is arranged according to the block diagram in Figure 6B. Tuning constants of the integral-only controllers for all three diagonal elements in Kdiagare set to1/6. Following the

procedure in section 2.3, two feeding policies are exam-ined. The feed sequencing procedure is as follows: (1) perform SVD on Kp, (2) compute yavg,i (i ) 1, ..., 59)

across the radial position for all 24 incoming wafers (t ) 1, ..., 24), (3) find the corresponding temporal devia-tions from the average ones (Et ) yt - yavg) for each

incoming (eq 13), (4) compute the load strength indicator d_t/(eq 19), and (5) sort the feed according to d_t/. The 24 wafers are sorted accordingly for the R2L policy and perturbed randomly for policy R. The result, Figure 11, indicates that a 35% improvement in the control per-formance can be achieved by arranging these 24 incom-ing wafers properly (policy R2L), while the random feed policy (policy R) practically offers no improvement as compared to the open-loop error.

4. Conclusion

In semiconductor manufacturing processes, particu-larly CMP, we have some prior knowledge about the quality of incoming wafers (because they come directly from the previous processing step). In this work, we intend to answer the following question: Is it possible to achieve improved R2R control by rearranging the incomings? First, the characteristics of a feedback system are studied, and its limitation is also explained in the frequency domain. That is, a negative feedback system is effective at rejecting a low-frequency type of disturbance, and this is exactly why R2R control was shown to be effective at overcoming shifts/drifts often encountered in semiconductor manufacturing processes. This work strengthens the capability of R2R control by sequencing the incoming such that improved control performance can be achieved. From the feedback prop-erty, the answer to the feed sequencing problem be-comes obvious: arrange the feed (i.e., load disturbance) in such a way that it gives a low-frequency character-istic. Furthermore, a scalar load strength indicator is proposed and an engineering approach is taken to sort the incomings effectively (as opposed to exhausting all k! possibilities). Two policies are proposed: from thin to thick (in terms of the prethickness, policy L2R) and from thick to thin (policy R2L). They are shown to provide significant improvement in the control perfor-mance, while the typically used random feed policy (R) shows practically no reduction in the closed-loop error over the open-loop one. The issues of controller tuning, model mismatches, and time-varying (slow-drifting)

experimental CMP process. Again, the results show that a significant improvement in the control performance can be obtained by arranging the feed properly. Finally, it is important to recognize that this improvement is achieved by extremely simple means: rearranging the incoming as a low-frequency type of disturbance. Acknowledgment

This work was supported in part by the National Science Council of Taiwan under grant NSC 93-2214-E002-026.

Appendix: Load Strength Indicator for Multivariable Systems Based on Achievable Performance

Another approach to finding the correct feed sequenc-ing measure is to take the capability of the least-squares-based controller into account. The achievable performance is used here, and without further informa-tion, let us assume that the desired removed thickness is simply yavg,i. From the achievable performance for a

nonsquare system, we have

From eqs 14 and 16, we have

The second and third factors on the right-hand side of eq A2 can be viewed as the load changes. Thus, one obtains

From the definition of the load strength indicator, we arrive at

This implies sequencing of the feed according to Et provided with the U matrix.

Notations

CMP ) chemical mechanical polishing

d ) load variable

d ) vector of the load variable (temporal or spatial mode) d0) vector of the load variable (temporal mode)

ds_{) spatial part of the load disturbance}

dtt) temporal part of the load disturbance

dt /

) transformed load variable (load strength indicator)

G ) process transfer function e ) error

e* ) transformed error

eCL) closed-loop (under control) error

Emin) least-squares error for a nonsquare system

eOL) open-loop (without control) error

I ) integral-only controller k ) number of feeds Emin) y set_{- y} best) (I - KpKp † )yset) (I - UUT)yset (A1)

y_t) y_best+ (y_avg- y_best) + E_t) ybest+ (I - UU T )yavg+ Et (A2) dt) (I - UU T )yavg+ Et (A3) dt /_{) U}T dt) U T (I - UUT)yavg+ U T_E t) U T_E t (A4)

(12)

K ) controller transfer function Kc) controller gain

Kdiag) diagonal controller under SVD

Kp) steady-state gain matrix

Kpi) ith column of Kp

K_p†= pseudo-inverse of Kp

L2R ) left-to-right (based on a normal distribution) policy

m ) number of outputs n ) number of inputs N ) normal distribution p ) pressure (for CMP) PI ) proportional-integral controller PI2_{) proportional-plus-double-integral controller}

PSD ) power spectrum density

R2L ) right-to-left (based on a normal distribution) policy R2R ) run to run

RR ) removal rate

S ) sensitivity function

SVD ) singular value decomposition

u ) input variable

U ) output orthonormal matrix (from SVD) v ) rotation speed (for CMP)

V ) input orthonormal matrix (from SVD) y ) output variable

yavg,i) average value of the output in temporal mode at

the ith position

ybest) achievable performance (Emin) yset- ybest)

yset_{) set point of y}

z ) z-transformation variable Greek Letters

 )deviation from the mean value

σ ) singular value ω ) frequency

ωN) normalized frequency Literature Cited

(1) Ingolfsson, A.; Sachs, E. Stability and sensitivity of an EWMA controller. J. Qual. Technol. 1993, 25, 271-287.

(2) Sachs, E.; Hu, A.; Ingolfsson, A. Run by run process control: combining SPC and feedback control. IEEE Trans. Semicond. Manuf. 1995, 8, 26-43.

(3) Butler, S. W.; Stefani, J. A. Supervisory run-to-run control of polysilicon gate etch using in situ ellipsometry. IEEE Trans. Semicond. Manuf. 1994, 7, 193-201.

(4) Chen, A.; Guo, R. S. Age-based double EWMA controller and its application to CMP processes. IEEE Trans. Semicond. Manuf.

2001, 14, 11-19.

(5) Boning, D. S.; Moyne, W. P.; Smith, T. H.; Moyne, J.; Telfeyan, R.; Hurwitz, A.; Shellman, S.; Taylor, J. Run by run control of chemical-mechanical polishing. IEEE Trans. Compon. Packag. Manuf. Technol. Part C 1996, 19, 307-314.

(6) Qin, S. J.; Scheid, G. W.; Riley, T. J. Adaptive run-to-run control and monitoring for a rapid thermal processor. J. Vac. Sci. Technol. B 2003, 21, 301-310.

(7) Del Castillo, E.; Hurwitz, A. M. Run-to-run process con-trol: literature review and extensions. J. Qual. Technol. 1997, 29, 184-196.

(8) Nagrath, D.; Bequette, B. W.; Cramer, S. M. Evolutionary operation and control of chromatographic processes. AIChE J.

2003, 49, 82-95.

(9) Patel, N. S.; Miller, G. A.; Guinn, C.; Jenkins, S. T. Device dependent control of chemical-mechanical polishing of dielectric films. IEEE Trans. Semicond. Manuf. 2000, 13, 331-343.

(10) Ljung, L.; Soderstrom, T. Theory and Practice of Recursive Identification; MIT Press: Cambridge, MA, 1983.

(11) Moyne, J.; Del Castillo, E.; Hurwitz, A. M. Run-to-Run Control in Semiconductor Manufacturing; CRC Press: Boca Raton, FL, 2001.

(12) Del Castillo, E. Statistical Process Adjustment for Quality Control; Wiley: New York, 2002.

(13) Skogestad, S.; Postlethwaite, I. Multivariable Feedback Control; John Wiley & Sons: Chichester, U.K., 1996.

(14) Strang, G. Linear Algebra and Its Applications, 3rd ed.; Harcourt Brace Jonovich: San Diego, CA, 1988.

(15) Steward, G. W. Introduction to Matrix Computations; Academic Press: New York, 1973.

(16) Edgar, T. F.; Butler, S. W.; Campbell, W. J.; Pfeiffer, C.; Bode, C.; Hwang, S. B.; Balakrishnan, K. S.; Hahn, J. Automatic control of microelectronics manufacturing: practices, challenges and possibilities. Automatica 2000, 36, 1567-1603.

(17) Kaufman, F. B.; Thompson, D. B.; Broadie, R. E.; Jaso, M. A.; Guthrie, W. L.; Pearson, D. J.; Small, M. B. Chemicalme-chanical polishing for fabricating patterned W metal features as chip interconnects. J. Electrochem. Soc. 1991, 138, 3460-3465.

(18) Yao, C. H.; Feke, D. L.; Robinson, K. M.; Meikleb, S. The influence of feature-scale surface geometry on CMP processes. J. Electrochem. Soc. 2000, 147, 3094-3099.

(19) Chen, C. Y.; Yu, C. C.; Shen, S. H.; Ho, M. Operational aspects of chemical mechanical polishing: polish pad profile optimization. J. Electrochem. Soc. 2000, 147, 3922-3930.

(20) Chiu, J. B.; Yu, C. C.; Shen, S. H. Application of soft landing to the process control of chemical mechanical polishing. Microelectron. Eng. 2003, 65, 345-356.

(21) Kao, Y. C.; Yu, C. C.; Shen, S. H. Robust operation of copper chemical mechanical polishing. Microelectron. Eng. 2003, 65, 61-75.

(22) Prasad, S.; Loh, W.; Kapoor, A.; Chang, E.; Stein, B.; Boning, D.; Chung, J. Statistical metrology for characterizing CMP processes. Microelectron. Eng. 1997, 33, 231-240.

(23) Chiu, J. B.; Su, A. J.; Yu, C. C.; Shen, S. H. Planarization strategy of Cu CMP: interaction between of plated copper thick-ness and removal rate. J. Electrochem. Soc. 2004, 151, G217-G222.

(24) Shiu, S. J.; Yu, C. C.; Shen, S. H. Multivariable Control of Multi-Zone Chemical Mechanical Polishing. J. Vac. Sci. Technol. B 2004, 22, 1679-1687.

(25) Preston, F. W. The theory and design of plate glass polishing machines. J. Soc. Glass Technol. 1927, 11, 214-256.

Received for review September 5, 2004 Revised manuscript received February 4, 2005 Accepted April 15, 2005