y: soft drink delivery time (minutes)

(1)

複迴歸之一

(2)

Example: vendor machine

install.packages('MPV') library(MPV)

data(softdrink)

y: soft drink delivery time (minutes)

x1: delivery volume (number of cases)

x2: distance walked

25 obs.

(3)

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 3.321 1.371 2.422 0.0237 * x1 2.176 0.124 17.546 ***

Residual standard error: 4.181 on 23 degrees of freedom Multiple R-Squared: 0.9305, Adjusted R-squared: 0.9275 F-statistic: 307.8 on 1 and 23 DF, p-value: 8.22e-15

fm2=lm(formula = y ~ x2) Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 4.961 2.337360 2.123 0.0448 x2 0.042569 0.004506 9.447 ***

(4)

(5)

Cook's distance

Cook's distance is a measurement of the influence of the ith

data point on all the other data points. In other words, it tells how much influence the ith case has upon the

model. The formula to find Cook's distance, Di, is

where

is the predicted (fitted) value of the ith observation;

is the predicted value of the jth observation using a new regression equation found by deleting the ith case;

p is the number of parameters in the model MSE is the Mean Square Error

(6)

plot(fm1)

(7)

plot(fm1)

(8)

Multiple Regression Model

1. ^The ^k -Variable Multiple Regression Model

2. The F Test of a Multiple Regression Model

3. How Good is the Regression?

4. Tests of the Significance of Individual Regression Parameters

5. Testing the Validity of the Regression Model

6. Using the Multiple Regression Model for

Prediction

(9)

The estimated regression relationship:

where is the predicted value of Y, the value lying on the estimated regression surface. The terms b

₀

,...,k are the least- squares estimates of the population regression parameters β

_i

. The estimated regression relationship:

where is the predicted value of Y, the value lying on the estimated regression surface. The terms b

₀

,...,k are the least- squares estimates of the population regression parameters β

_i

.

$Y b =

₀

+ b X

₁ ₁

+ b X

₂ ₂

+ + L b X

_k _k

$Y

The actual, observed value of Y is the predicted value plus an error:

y

_j

= b

₀

+ b

₁

x

_1j

+ b

₂

x

_2j

+. . . + b

_k

x

_kj

+e

The actual, observed value of Y is the predicted value plus an error:

y

_j

= b

₀

+ b

₁

x

_1j

+ b

₂

x

_2j

+. . . + b

_k

x

_kj

+e

The Estimated Regression

Relationship

(10)

∑

+ +

=

+ +

=

+ +

=

2 2 2

2 1 1 2

0 2

2 1 2 2

1 1 1

0 1

2 2

1 1 0

x b x

b y x

x x b

x b x

b y x

x b x

b nb

y

Minimizing the sum of squared errors with respect to the estimated coefficients b

₀

, b

₁

, and b

₂

yields the following normal equations:

Minimizing the sum of squared errors with respect to the estimated coefficients b

₀

, b

₁

, and b

₂

yields the following normal equations:

Least-Squares Estimation:

The 2-Variable Normal Equations

(11)

The Matrix Approach to Regression Analysis

The population regression

y y y

y

x x x x

k

n n n nk

. . . ..

model:

. . .

...

. . . . .

.

1

2

3

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

1

2

1 1 1

1

⎡

⎣

⎢⎢

⎢

⎤

⎦

⎥⎥

⎥

=

⎡

⎣

⎢⎢

⎢

⎤

⎦

⎥⎥

⎥ β β β

β

ε ε ε

ε β ε

3

1

2

3

. . .

k k

Y X

The estimated regression

⎡

⎣

⎢⎢

⎢

⎤

⎦

⎥⎥

⎥ +

⎡

⎣

⎢⎢

⎢

⎤

⎦

⎥⎥

⎥

= +

model:

Y = Xb + e

(12)

The Matrix Approach to Regression Analysis

The normal equations X Xb X Y

Estimators

b X X X Y

values

Y Xb X X X X Y HY

V b X X

s b MSE X X

:

( )

:

$ ( )

( ) ( )

′ = ′

= ′ ′

= = ′ ′ =

= ′

−

− 1

1

2 1

Predicted

σ

(13)

Total Deviation = Regression Deviation + Error Deviation SST = SSR + SSE

x

₂

x

₁

y

Y − $: Error Deviation Y

$Y Y − : Regression Deviation Total deviation: Y − Y

} }

Decomposition of the Total Deviation in

a Multiple Regression Model

(14)

A statistical test for the existence of a linear relationship between Y and any or all of the independent variables X₁, x₂, ..., X_k:

H₀: β₁ = β₂ = ...= β_k=0

H₁: Not all the β_i (i=1,2,...,k) are 0

A statistical test for the existence of a linear relationship between Y and any or all of the independent variables X₁, x₂, ..., X_k:

H₀: β₁ = β₂ = ...= β_k=0

H₁: Not all the β_i (i=1,2,...,k) are 0

Source of Variation

Sum of Squares

Degrees of

Freedom Mean Square

F Ratio Regression SSR k

Error SSE n - (k+1)

Total SST n-1

MSR SSR

= k

MSE SSE

n k

= ( − ( + 1))

MST SST

= n

− ( 1)

The Overall F-Test of a Multiple

Regression Model

(15)

The multiple coefficient of determination, R2

, measures the proportion of the variation in the dependent variable that is explained by the combination of the independent variables in the multiple regression model:

= SSR

SST = 1 - SSE

SST R

²

The is an unbiased

estimator of the variance of the population errors, denoted by 2

:

=

mean square error

Standard error of estimate

ε, σ :

( ( ))

( $) ( ( ))

MSE SSE

n k

y y n k

s MSE

= − + = ∑ −

− + 1

2 1

x

₂

x

₁

y

Errors: y - y$

How Good is the Regression

(16)

The , R2 , is the coefficient of

determination with the SSE and SST divided by their respective degrees of freedom:

= 1 -

SSE (n - (k + 1))

SST (n - 1)

adjusted multiple coefficient of determination

R 2

SST

SSE SSR

= SSR

SST = 1 - SSE R2 SST

Example 11-1: s = 1.911 R-sq = 96.1% R-sq(adj) = 95.0%

Decomposition of the Sum of Squares and the

Adjusted Coefficient of Determination

(17)

Degrees of

Freedom Mean Square F Ratio Regression SSR (k)

Error SSE (n-(k+1))

=(n-k-1)

Total SST (n-1)

MSR SSR

= k

MSE SSE

n k

= ( −( +1))

MST SST

= n

− ( 1)

F MSR

= MSE

=

SSR SST

= 1 - SSE SST R2

= 1 -

SSE (n - (k + 1))

SST (n - 1)

= MSE MST R2

F

R R

n k k

= −

− +

2

1 2

1

( )

( ( ))

( )

Measures of Performance in Multiple

Regression and the ANOVA Table

(18)

Hypothesis tests about individual regression slope parameters:

(1) H

₀

: β

₁

= 0 H

₁

: β

₁

≠ 0 (2) H

₀

: β

₂

= 0 H

₁

: β

₂

≠ 0

. . .

(k) H

₀

: β

_k

= 0 H

₁

: β

_k

≠ 0

Hypothesis tests about individual regression slope parameters:

(1) H

₀

: β

₁

= 0 H

₁

: β

₁

≠ 0 (2) H

₀

: β

₂

= 0 H

₁

: β

₂

≠ 0

. . .

(k) H

₀

: β

_k

= 0 H

₁

: β

_k

≠ 0

T e s t s t a t i s t i c f o r t e s t i t b

n k

s b

i

:

⁽ ⁻⁽ ⁺ ⁾

= ( − )

1

0 Tests of the Significance of

Individual Regression Parameters

(19)

pairs(soft drink)

(20)

x1 0.965 1.000 0.824 x2 0.892 0.824 1.000 Call: lm( y ~ x1 + x2) Coefficients:

(Intercept) 2.341 x1 1.616 x2 0.0144

Degrees of

Freedom Mean Square

F Ratio

Regression 5550 2 2775 261.2

Error 233.7 22 10.62

Total SST 24

(21)

Estimate Std. Error t value Pr(>|t|) (Int) 2.341231 1.096730 2.135 *

x1 1.615907 0.170735 9.464 ***

x2 0.014385 0.003613 3.981 ***

Confint(model)

2.5 % 97.5 %

x1 1.261824662 1.96998976 x2 0.006891745 0.02187791

95% C.I. on the mean delivery time for an outlet requiring 8 cases and the distance 275 feet?

95% C.I. on the delivery time?

fitted= 19.22432 CI. [17.65 20.79] PI. [12.285 26.16]

(22)

2 11.50 3 220 10.35 1.15 0.36 0.07 3 12.03 3 340 12.08 -0.05 -0.02 0.10 4 14.88 4 80 9.96 4.92 1.64 0.09 5 13.75 6 150 14.19 -0.44 -0.14 0.08 6 18.11 7 330 18.40 -0.29 -0.09 0.04 7 8.00 2 110 7.16 0.84 0.26 0.08 8 17.83 7 210 16.67 1.16 0.36 0.06 9 79.24 30 1460 71.82 7.42 4.31 0.50 10 21.50 5 605 19.12 2.38 0.81 0.20 11 40.33 16 688 38.09 2.24 0.71 0.09 12 21.00 10 215 21.59 -0.59 -0.19 0.11 13 13.50 4 255 12.47 1.03 0.32 0.06 14 19.75 6 462 18.68 1.07 0.33 0.08 15 24.00 9 448 23.33 0.67 0.21 0.04 16 29.00 10 776 29.66 -0.66 -0.22 0.17 17 15.35 6 200 14.91 0.44 0.13 0.06 18 19.00 7 132 15.55 3.45 1.12 0.10 19 9.50 3 36 7.71 1.79 0.57 0.10 20 35.10 17 770 40.89 -5.79 -2.00 0.10 21 17.90 10 140 20.51 -2.61 -0.87 0.17 22 52.32 26 810 56.01 -3.69 -1.49 0.39 23 18.75 9 450 23.36 -4.61 -1.48 0.04 24 19.83 8 635 24.40 -4.57 -1.54 0.12 25 10.75 4 150 10.96 -0.21 -0.07 0.07

residual analysis

(23)

(24)

An autocorrelation is a correlation of the values of a variable with values of the same variable lagged one or more periods back. Consequences of autocorrelation include inaccurate estimates of variances and inaccurate predictions.

Lagged Residuals

i ε_i ε_i-1 ε_i-2 ε_i-3 ε_i-4

1 1.0 * * * *

2 0.0 1.0 * * *

3 -1.0 0.0 1.0 * *

4 2.0 -1.0 0.0 1.0 *

5 3.0 2.0 -1.0 0.0 1.0

6 -2.0 3.0 2.0 -1.0 0.0

7 1.0 -2.0 3.0 2.0 -1.0

8 1.5 1.0 -2.0 3.0 2.0

9 1.0 1.5 1.0 -2.0 3.0

10 -2.5 1.0 1.5 1.0 -2.0

Lagged Residuals

i ε_i ε_i-1 ε_i-2 ε_i-3 ε_i-4

1 1.0 * * * *

2 0.0 1.0 * * *

3 -1.0 0.0 1.0 * *

4 2.0 -1.0 0.0 1.0 *

5 3.0 2.0 -1.0 0.0 1.0

6 -2.0 3.0 2.0 -1.0 0.0

7 1.0 -2.0 3.0 2.0 -1.0

8 1.5 1.0 -2.0 3.0 2.0

9 1.0 1.5 1.0 -2.0 3.0

10 -2.5 1.0 1.5 1.0 -2.0

The Durbin-Watson test (first-order autocorrelation):

H₀: ρ₁= 0 H₁: ρ₁≠ 0

The Durbin-Watson test statistic:

The Durbin-Watson test (first-order autocorrelation):

H₀: ρ₁= 0 H₁: ρ₁≠ 0

The Durbin-Watson test statistic:

d i ei ei

n

i ei

= =∑ n− −

∑=

( )

1 2 2

2 1

Residual Autocorrelation and the

Durbin-Watson Test

(25)

多項式迴歸

反應函數為曲線型式，下面是單一預測變數多項式迴歸模型的一種情形：

經轉換後之變數變數經過轉換後的模型可能含有複雜的曲線型式反應函數，不過它仍是一般線性迴歸模型的特例，考慮下面這個變數Y經轉換後的模型：

此模型的反應曲面雖然複雜，不過還是可以依照一般線

性迴歸模型來處理。令，則迴歸模型為：

(26)

HW

Fit the model y=b0+b1x2+b2x^2+e to all the 25 cases.

Find the LSE and standardized residuals and cook's D values. Write down the R2 values and state your choice for model with or without x^2.

Remove an influential obs. and fit the model y~x1+x2 to the remaining 24 obs. How, if at all, has

removeal of the influential case modifed the

fitted model and the fit?

(27)

selection

(28)

(29)

X2 0.924 1.000 0.085 0.878 X3 0.458 0.085 1.000 0.142 Y 0.843 0.878 0.142 1.000

(30)

Response: Y

Df Sum Sq Mean Sq F value Pr(>F) X1 1 352.27 352.27 44.305 3.024e-06 ***

Residuals 18 143.12 7.95 Response: Y

Df Sum Sq Mean Sq F value Pr(>F) X2 1 381.97 381.97 60.617 3.6e-07 ***

Residuals 18 113.42 6.30

(31)

Response: Y

Df Sum Sq Mean Sq F value Pr(>F) X1 1 352.27 352.27 57.2768 1.131e-06 ***

X2 1 33.17 33.17 5.3931 0.03373 * X3 1 11.55 11.55 1.8773 0.18956 Residuals 16 98.40 6.15

Response: Y

Df Sum Sq Mean Sq F value Pr(>F) X1 1 352.27 352.27 54.4661 1.075e-06 ***

X2 1 33.17 33.17 5.1284 0.0369 *

Residuals 17 109.95 6.47

(32)

由於額外平方和是在衡量新舊兩種迴歸模型間的誤差平方和之差，因此我們可以定義：

(7.1a) 或

(7.1b) 若X2是後來才加入的新預測變數，則

(7.2a) 或

(7.2b)

(33)

(7.3a) 或

(7.3b) 而且，

(7.4a) 等價於

(7.4b)

(34)

(35)

(36)

將具有兩個自由度，因為它可以表示成兩個各具有一個自由度的額外平方和之總和，例如：

(7.11)

所以均方可以計算如下：

(37)

• 檢定單一

此外，上述檢定也可以根據2.8節所介紹的一般線性檢定方法來進行。我們先考慮三個預測變數的一階迴歸模型：

(7.12) 假設檢定為：

(7.13) (7.13)的成立時可以得到縮減模型為：

(7.14)

(38)

亦即：

上式分子中的兩個誤差平方和之差正是(7.3a)的額外平方和：

所以一般線性檢定之統計量為：

(7.15)

(39)

(40)

檢定的虛無假設為：

(7.16) 採用一般的線性檢定，在成立下之縮減模型為：

(7.17) 檢定之統計量為：

(7.18)

(41)

(42)

• 檢定所有的

式子(6.39)的整體F檢定是針對反應變數Y對於所有的預測變數X，是否存在迴歸關係之檢定，檢定的虛無假設為：

(7.21) 而檢定的統計量為：

(7.22)

(43)

針對特定迴歸係數的部份F檢定，檢定的虛無假設為：

(7.23) 而檢定的統計量為：

(7.24) 式子(6.51b)為一個等價的檢定統計量：

(7.25)

(44)

這是另一種部份F檢定，檢定的虛無假設為：

(7.26) 為了方便表示，我們把要進行檢定的(p - q)個變數移至最後，檢定的統計量為：

(7.27)

(45)

有時迴歸係數的檢定並不是針對單一或多個

，則無法使用額外平方和，所以一定要分別配適全模型與縮減模型，然後進行一般線性檢定，例如全模型中有三個預測變數X：

(7.30) 而我們想檢定：

(7.31) 於是可以先配適全模型，然後配適縮減模型：

(7.32)

(46)

型(7.30)中：

(7.33) 縮減模型為：

(7.34)

(47)

• Stepwise procedures

9 Forward selection

•

Add one variable at a time to the model, on the basis of its F statistic

9 Backward elimination

•

Remove one variable at a time, on the basis of its F statistic

9 Stepwise regression

•

Adds variables to the model and subtracts variables from the model, on the basis of the F statistic

• Stepwise procedures

9 Forward selection

•

Add one variable at a time to the model, on the basis of its F statistic

9 Backward elimination

•

Remove one variable at a time, on the basis of its F statistic

9 Stepwise regression

•

Adds variables to the model and subtracts variables from the model, on the basis of the F statistic

(48)

Compute F statistic for each variable not in the model

Enter most significant (smallest p-value) variable into model

Calculate partial F for all variables in the model

Is there a variable with p-value > P_out? Remove variable Stop

Yes

Is there at least one variable with p-value > P_in? No

No

(49)

coefficients

• Unit normal scaling

• Unit length scaling

y j i

j j ij

ij s

y y y

s x

Z x −

− =

= , ⁰

n i

z b

y

^*

= *

₁ _i₁

+ L + *

_k _ik

+ ε

_i

, = 1 , K ,

∑

⁻

− =

= −

−

= − ( )²

1

* 1 , 1

1

j ij

jj y

j i

j j ij

ij s x x

s y y

y n s

x x

z n

(50)

透過(7.44)的相關轉換後，進行配適的迴歸模型稱為標準化迴歸模型：

(7.45) 另外我們可以發現標準化迴歸模型的參數

與一般複迴歸模型(6.7)之參數有

下列的關係：

(7.46a)

(7.46b)

(51)

1.0 0.5

0.0 100

50

0

R_h² VIF

Relationship between VIF and R_h²

y: soft drink delivery time (minutes)

Example: vendor machine

install.packages('MPV') library(MPV)

data(softdrink)

y: soft drink delivery time (minutes)

x1: delivery volume (number of cases)

x2: distance walked

25 obs.

Cook's distance

Multiple Regression Model

1. The k -Variable Multiple Regression Model

2. The F Test of a Multiple Regression Model

3. How Good is the Regression?

4. Tests of the Significance of Individual Regression Parameters

5. Testing the Validity of the Regression Model

6. Using the Multiple Regression Model for

Prediction

The estimated regression relationship:

where is the predicted value of Y, the value lying on the estimated regression surface. The terms b

,...,k are the least- squares estimates of the population regression parameters β

. The estimated regression relationship:

where is the predicted value of Y, the value lying on the estimated regression surface. The terms b

,...,k are the least- squares estimates of the population regression parameters β

.

$Y b =

+ b X

+ b X

+ + L b X

$Y

The actual, observed value of Y is the predicted value plus an error:

y

= b

+ b

x

+ b

x

+. . . + b

x

+e

The actual, observed value of Y is the predicted value plus an error:

y

= b

+ b

x

+ b

x

+. . . + b

x

+e

The Estimated Regression

Relationship

+ +

=

+ +

=

+ +

=

x b x

x b x

b y x

x x b

x b x

b y x

x b x

b nb

y

Minimizing the sum of squared errors with respect to the estimated coefficients b

, b

, and b

yields the following normal equations:

Minimizing the sum of squared errors with respect to the estimated coefficients b

, b

, and b

yields the following normal equations:

Least-Squares Estimation:

The 2-Variable Normal Equations

The Matrix Approach to Regression Analysis

The Matrix Approach to Regression Analysis

The normal equations X Xb X Y

Estimators

1. ^The ^k -Variable Multiple Regression Model