A unified model for detecting efficient and inefficient outliers in data envelopment analysis

(1)

Contents lists available atScienceDirect

Computers & Operations Research

journal homepage:w w w . e l s e v i e r . c o m / l o c a t e / c o r

A unified model for detecting efficient and inefficient outliers in data

envelopment analysis

Wen-Chih Chen

a,

∗

_{, Andrew L. Johnson}

b

a_{Department of Industrial Engineering and Management, National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu 300, Taiwan} b_{Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX, USA}

A R T I C L E I N F O A B S T R A C T

Available online 21 June 2009 Keywords:

Data envelopment analysis Outlier

Post analysis

Data envelopment analysis (DEA) uses extreme observations to identify superior performance, making it vulnerable to outliers. This paper develops a unified model to identify both efficient and inefficient outliers in DEA. Finding both types is important since many post analyses, after measuring efficiency, depend on the entire distribution of efficiency estimates. Thus, outliers that are distinguished by poor performance can significantly alter the results. Besides allowing the identification of outliers, the method described is consistent with a relaxed set of DEA axioms. Several examples demonstrate the need for identifying both efficient and inefficient outliers and the effectiveness of the proposed method. Applications of the model reveal that observations with low efficiency estimates are not necessarily outliers. In addition, a strategy to accelerate the computation is proposed that can apply to influential observation detection.

1. Introduction

Data Envelopment Analysis (DEA) introduced by Charnes et al. in[6]is a mathematical programming technique for evaluating the efficiency of an observation relative to a set of similar observations. Generally viewed as a success story for the operations research com-munity[15], DEA's real-world relevance, diffusion, and global popu-larity are evident in literature such as Seiford et al.[27]. It has been applied variously to financial institutions (e.g.,[10,14]), technology investment evaluation (e.g.,[9]), among many other applications.

DEA efficiency estimates are quite sensitive to the presence of outliers since the method uses extreme observations to identify su-perior performance[28]. However, outliers can be difficult to iden-tify, because each record describing an observation is typically a high-dimensional vector with multiple inputs and outputs. Some outliers are the results of measuring or recording errors, while others are the results of unusual characteristics, including factors related to the external environment, or uncontrollable factors. However, they can also be associated with low probabilities of occurrence. When the associated observations differ greatly from the remainder of the data set, the outliers can represent unexpected knowledge to be gained.

If the concept of outliers is intuitive, a rigorous definition is some-what elusive. In the literature, outliers have been loosely defined as

∗ Corresponding author.

E-mail addresses:[email protected](W.-C. Chen),

[email protected](Andrew L. Johnson).

an observation (or a set of observations) which appears to be inconsis-tent with the remainder of that set of data [5]. In the efficiency es-timation context, many authors term observations with significant influence on others' efficiency estimates as influential observations (e.g.,[23,32]). An influential observation typically owes its influence to the fact that it is an outlier and supports part of the deterministic frontier. However, Pastor et al.[23]and Simar[30]observe that an outlier is not necessarily an influential observation, and that influ-ential observation is not necessarily far away from the data cloud.

In the nonparametric efficiency analysis literature some studies with a particular interest in efficiency estimates attempt to detect influential observations (e.g.,[23,25,32]), while others focus on de-tecting outliers removed from the data cloud (e.g.,[13,31]). From the perspective of the theories utilized, still others (e.g.,[23,32,25]) use estimates of DEA efficiency and the frontier concept. Wilson[31]and Fox et al.[13]examine the dissimilarity of an input–output record to other observations. Simar[30]notes that these types of approaches

do not take the frontier aspects of the problem into account. This is a

significant limitation, because researchers are mostly interested in detecting overly efficient outliers with the most influence upon the efficiency measures. As a result, another direction of the outlier liter-ature considers statistical inferences in DEA's nonparametric context (e.g.,[30]).

This paper is motivated by a Web-based DEA benchmarking tool for warehouse operations [19]. Although the Internet allows researchers to collect data quickly and securely, the data may be entered in error, or may not represent an actual facility. These draw-backs increase the need for effective data filtering techniques that

(2)

can identify inefficient outliers, a growing concern for researchers. Performance benchmarking is a set of processes and practices used to determine (i) reference values for selected performance indices, and (ii) factors for key processes affecting performance[21]. Both goals rely on quality data, and the latter is also affected by ineffi-cient outliers. When examining commercial pharmacies, Johnson and McGinnis[18]demonstrate the effect of inefficient outliers on the results of a second stage regression to identify attributes that corre-late positively with efficiency. They find that a particular attribute, population of the surrounding area, is unrelated to store efficiency, but that it correlates positively with efficiency when the inefficient outliers are removed.

Inefficiency outliers are also an issue in post analyses using DEA efficiency estimates, e.g., statistical testing (determining whether two populations are equally efficient), cross-validation (using the weights selected by each observation to define a common set of weights in a post analysis), distribution analysis (determining whether DEA efficiency estimates are consistent with economic the-ories of efficiency of markets), benchmarking (identifying best and worse practices), industry trends analysis (identifying the efficient observations used as benchmarks for a large set of observations), etc. Only Johnson and McGinnis[18]employ the “inefficient frontier” concept to detect possible outliers that perform poorly. But the “inefficient frontier” concept is ad hoc, and is not consistent with the standard DEA axioms. Production theory assumes that observations are bounded by those with superior performance, and that interior points (with respect to the efficient frontier) are always feasible. Simply applying existing procedures, e.g., Pastor et al.[23]to the in-efficient observations violates standard axioms of DEA (production theory) and thus is logically problematic.

This paper aims to identify outliers that influence both efficiency estimates and DEA post analysis. We approach the problem by iden-tifying a set of axioms and developing an approach consistent with the axioms. Adopting a relaxed set of DEA axioms allows the detec-tion and ranking of both efficient outliers that influence the efficiency estimates and inefficient outliers that may influence post analysis procedures. Identifying either efficient or inefficient outliers sepa-rately is also possible. Applications of the model to real-world case studies demonstrate that intuitively flagging the worst-performing observations as inefficient outliers is not necessarily correct. In ad-dition, a strategy to accelerate the computation is proposed that can be applied to influential observation detection such as Pastor et al. [23].

The remainder of this paper is organized as follows. The next section proposes new outlier measures taking efficiency influence into account to measure the effect of an outlier or a set of outliers. Section 3 describes four case studies that demonstrate the proposed method. Section 4 offers a computational remark, and Section 5 states the conclusions.

2. Outlier measures

This section introduces the fundamentals of DEA and proposes new outlier measures considering efficiency influence.

2.1. Fundamental

Consider an input set I and an output set J. Denote x∈ |I|+ as an input vector and y ∈ |J|+ as an output vector. The production possibility set (PPS) T, representing the feasibility of transforming inputs to outputs, is defined as

T ≡ {(x, y) : y can be produced by x}.

Shephard[29] defines the output distance function (Do(x,y)) and

input distance function (Di(x,y)) between any specific input–output

bundle (x,y) and boundary of T as follows:

Do(x, y)≡ inf{

: (x, y/

)∈ T},

Di(x, y)≡ sup{

: (x/

, y)∈ T}.

Distance functions measure how far to locate (x,y) on the boundary of T by changing either its outputs or inputs proportionally. Any (x,y) with a distance function of one is referred to as being on the boundary (frontier) of T.

In practice, T is unknown. Given a set of observations S with input–output vector (xr,yr) r∈ S, the empirical production possibility

set (EPPS) can be approximated using the following axioms[4]:

1. (Convexity) If (x,y)∈ T and (x_,y₎_{∈ T, then}

_(x,y)+(1₋

_)(x_,y₎_{∈ T,} for

∈ [0,1].

2. (Free disposability) (x,y)∈ T, if (x,y) ∈ T, x

_ⱖ

_{x and y}

_ⱕ

_y. The EPPS can then be expressed as a set of linear inequalities in |S| nonnegative variables and denoted as

ˆT ≡ (x, y) : r∈S x_r

_r

ⱕ

x; r∈S y_r

_r

ⱖ

y; r∈S

r= 1;

r

ⱖ

0,r ∈ S .

Then the practicable estimations for distance functions are as fol-lows: [ ˆDi(x, y)]−1= min

: r∈S x_r

_r

ⱕ

x; r∈S y_r

_r

ⱖ

y; r∈S

r= 1;

r

ⱖ

0,r ∈ S , [ ˆDo(x, y)]−1= max

: r∈S x_r

_r

ⱕ

x; r∈S y_r

_r

ⱖ

y; r∈S

r= 1;

r

ⱖ

0,r ∈ S .

If (x,y) is observed and thus is feasible, the distance function measures can be interpreted as the technical efficiency[12]1 _which estimates the relative efficiency for a particular record k_{∈ S} compar-ing against all observations in S. This leads to one of the well-known DEA models proposed by Banker et al. in[4]:

S k= min_,

: r∈S xr

r

ⱕ

x_k; r∈S yr

r

ⱖ

y_k; r∈S

r=1;

r

ⱖ

0,r ∈ S , (BCC.I)

S k= max_,

: r∈S x_r

_r

ⱕ

x_k; r∈S y_r

_r

ⱖ

y_k; r∈S

r=1;

r

ⱖ

0,r ∈ S . (BCC.O) In summary, (BCC.I) and (BCC.O) provide the radial efficiency esti-mates using ^T, which is constructed by observations according to axioms of convexity and free disposability. Observation k is said to be efficient and on the efficient frontier when its efficiency estimate is one.

It should be noted that ^T assumes free disposability, which implies that using more inputs and producing less outputs is always feasi-ble. Influential measures based on (BCC.I) and (BCC.O) (e.g.,[23]) will not flag inefficient observations as outliers, regardless of how poorly they perform. Free disposability adopted in standard DEA models actually assumes that all inefficient observations are part of the

(3)

PPS and does not allow for the concept of inefficient outliers; how-ever, this paper finds that inefficient outliers matter. Johnson and McGinnis[18]develop the idea of the “inefficient frontier” to flag overly inefficient observations; however, they do not approach the problem with consideration of DEA axioms and their model is thus not consistent with the free disposability axiom. This paper relaxes the assumption of free disposability and simply uses part of ^T based on convexity, allowing for potential outliers to be ranked based on their influence.

2.2. New measures of outliers

This section proposes an outlier measure that can identify both efficient and inefficient outliers. These outliers are measured relative to a set constructed consistent with a subset of DEA axioms, and the individual outliers are ranked based on their influence on the measures. For a data set S, adopting the convexity assumption, the convex hull of S is as follows:

ˆTS conv≡ (x, y) : r∈S x_r

_r_{= x;} r∈S y_r

_r_{= y;} r∈S

r= 1;

r

ⱖ

0,r ∈ S . ˆTS

convis a part of ˆT (ˆTconvS ⊂ ˆT). Extending the definition of free

dis-posability, the free disposal hull of a setA ⊂ |I|+× |J|+is[16]:

FDH(A) ≡ {(x_{, y}_{) : x}

_ⱖ

_{x, y}

_ⱕ

_{y for some (x, y)}_{∈ A}.} It is clear that ˆT = FDH(ˆTS

conv)[16], namely ˆTconvS is an essential

com-ponent of EPPS without applying free disposability that makes iden-tifying inefficient outliers impossible. Therefore, ˆTS

convcharacterizes

most important properties of ^T, and can be used to identify both efficient and inefficient outliers.

To identify outliers that influence both the efficiency estimates and DEA post analysis, radial measures with respect to ˆTS

conv are

proposed. For output-oriented analyses, a measure

S_kis defined as

S k≡ max {

: (xk,

yk)∈ ˆTconvS } = max ,

: r∈S x_r

_r=xk; r∈S y_r

_r=

y_k; r∈S

r=1;

r

ⱖ

0,r ∈ S . (1) The value of

S_kmeasures how much the outputs of observation k can be scaled up proportionately while remaining in ˆ_T_convS .

S

k

ⱖ

1

and (x_k,

y_k)_{/∈ ˆT}S_conv if

>

S_k; the projected point (x_k,

S_ky_k) refers to as on the outer boundary. Further,

S_kcan be interpreted as the “distance” between k and the outer boundary; it is said that the “distance” to the outer boundary is (100_×

S_k)% of yk.

S_k= 1 suggests

that k is on the outer boundary since it cannot be scaled up while maintaining in ˆTS

conv. This radial measure is identical to the

output-oriented efficiency estimate, but with respect to ˆTS

conv, not ^T. As a

result, (1) has a structure similar to (BCC.O), and thus ties directly to efficiency estimation. Observations with output efficiency estimates equal to one will have

S_k= 1; this is formally stated as follows: Proposition 1. For k∈ S,

S

k= 1 if

Sk= 1.

Proof. See the appendix.

Another measure related to k,

S_k, is defined as

S k≡ min {

: (xk,

yk)∈ ˆTSconv} = min ,

: r∈S x_r

_r= xk; r∈S y_r

_r=

y_k; r∈S

r= 1;

r

ⱖ

0,r ∈ S . (2)

Eq. (2) has the same interpretation as (1) but scaling k in the opposite direction. 0

ⱕ

S_k

ⱕ

1 and (x_k,

y_k)/∈ ˆTS

convif

<

Sk. The projected point

(x_k,

S_ky_k) refers to is located on the inner boundary of ˆTS

conv. Similarly,

S

k represents the distance between k and the inner boundary; it

suggests the distance is (100×

S

k)% of yk.

S_k= 1 suggests that k is on

the inner boundary, and the movement passes through the output origin (xk,0) when

S_k= 0.

The “difference” between projected points (x_k,

S_ky_k) and (x_k,

S_ky_k) is the width of segment constructed by identifying a ray within ˆTS

conv

from the (output) origin (xk,0) through observation k∈ S. To be

pre-cise, the “width” is defined as (

S_ky_k₋

S_ky_k)_/y_k₌

S_k₋

S_k, which spec-ifies the width as a percentage of yk. In a single-output example,

suppose yk= 100 and the projected points are 120 and 70.

There-fore,

S_k_{= 1.2 and}

S_k_{= 0.7 results in}

S_k₋

S_k_{= 0.5; it is consistent with} the original units of measure.

When the observation set R is removed from S (R⊂ S, k /∈ R), the corresponding convex hull ˆTconvS\R may change. The associated

mea-sures are denoted as

S\R_k and

S\R_k , and are computed as follows:

S\R k ≡ max_, ⎧ ⎨ ⎩

: r∈S\R x_r

_r_{= x}_k; r∈S\R y_r

_r₌

y_k; r∈S\R

r= 1;

r

ⱖ

0,r ∈ S\R ⎫ ⎬ ⎭, (3)

S\R k ≡ min_, ⎧ ⎨ ⎩

: r∈S\R x_r

_r_{= x}_k; r∈S\R y_r

_r₌

y_k; r∈S\R

r= 1;

r

ⱖ

0,r ∈ S\R ⎫ ⎬ ⎭. (4)

Since k _{/∈ R, (3) and (4) are always feasible. It is always possible to set}

k= 1 and achieve a feasible solution. Without R, the width becomes

S\R

k −

S\Rk . Accordingly, based on the above metrics, the width related

to k changes from

S_k−

S

kto

S\Rk −

S\Rk after R is removed, and the

influence on observation k due to R is measured as

o+i

k (R) ≡ (

Sk−

Sk)− (

S\Rk −

S\Rk ). (5)

The value of

o+i_k (_{R) gives the change in the width of the convex} hull with respect to k and R. Clearly,

S_k

ⱖ

S\R_k

ⱖ

1 and

S_k

ⱕ

S\R_k

ⱕ

1. Hence,

o+i_k (_R)

ⱖ

0, and larger values indicate more significant changes in the width of the convex hull with respect to k.

o+i_k (R)=0

suggests that removing the set R does not affect radial measures through k. R has a significant effect on k if

o+i_k (R) is significantly

large.

Notably,

S_k₌

S_k_{= 1 is possible and implies k is on both the inner} and outer boundaries. Observations for which this condition holds are typically extreme element of the EPPS, such as maximum or min-imum scale. In this case,

S\R_k ₌

S\R_k _{= 1 (k /∈ R), which indicates that}

k is unaffected by the removal of any observation set absent k.

How-ever, k may affect others and be flagged as an outlier. This additional information regarding observation k allows us to characterize and classify the possible source of k's dissimilarity, e.g., extreme in scale. However, the interpretation and use of this additional information are case dependent and subject to the user's judgment.

Other measures that consider only the change in width associ-ated with either the inner or the outer boundary can be similarly defined.

o_k(R) is the change caused by the outer boundary shift that

(4)

boundary shift as follows:

o

k(R) ≡

Sk−

S\Rk , (6)

i

k(R) ≡

Sk−

S\Rk . (7)

Note that

o_k(R)

ⱖ

0 and 0

ⱖ

i_k(R)

ⱖ

− 1. Further,

o+i

k (R) can be

ex-pressed as a combination of

o_k(R) and

i k(R):

o+i k (R) ≡ (

Sk−

Sk)− (

S\Rk −

S\Rk ) = (

S k−

S\Rk )− (

Sk−

S\Rk ) =

o k(R) −

ik(R) = |

o k(R)| + |

ik(R)|. (8)

Eq. (8) states that the total difference between the widths is the sum of the inner and outer parts.

o_k(R) and

i

k(R) can be considered

separately to classify R as either an efficient or an inefficient outlier. Eq. (8) assumes equal importance for both efficient and inefficient outliers. However, this assumption is unnecessary. Weights can be assigned for|

o

k(R)| and |

ik(R)| in (8) to represent differences in the

importance of the two types of outliers, and are typically determined based on subjective judgment.

We measure the influence of R for k based on the absolute dif-ference shown in (5)–(7) while Pastor et al.[23]use ratios to repre-sent the influence level. Applying Pastor et al.'s radial measure gives (

S_k−

S

k)/(

S\Rk −

S\Rk ), (

kS/

S\Rk ) and (

Sk/

S\Rk ) associated with (5)–(7),

respectively. Ratio measures have particular drawbacks in this con-text. First, the ratio measures are the percentage change of width as a percentage of yk, and losses are the original geometric

interpreta-tion (e.g., the length in a one-output case). Second, (5)–(7) allow the effect of the inner boundary shift and the outer boundary shift to be quantified and to aggregate (and decompose) information as shown in (8). However, methods for aggregating the ratio measures are not obvious.

Rather than judge whether R is outlying, this paper intends to rank the importance of potential outliers based on their influence.

∗_k(_R) only quantifies the effect on an individual observation k. A summary statistic of the overall influence of R on the data set provides infor-mation to characterize R. This summary statistic is used to prioritize and identify the observations that justify the costly activity of fur-ther examination to confirm the validity of suspicious data. Several summary statistics are available; for example, Wilson[32]uses total value and the average number of individual influences, where the number of observations that are affected is also of interest. Pastor et al.[23]summarize individual influences by a statistical model so that statistical inference is possible. Metrics such as total influence

k∈S

∗k(R) and average influence are reported in this work so that

the graphical meaning of data can be understood easily. To make a judgment, a threshold must be defined. The selection on a threshold level is case dependent, and represents the tradeoffs between the cost of confirmation and the expense of including questionable data in the analysis. A “loose” criterion increases the risk of the outlier's existence and a “strict cut” costs more in confirmation.

The applications of the suggested model may be problematic if the data set is ill-conditioned, i.e., the number of observations is small and the variables do not vary over a sufficiently wide range [22]. Outlier detection assumes the input–output space is stable; otherwise it is difficult to distinguish whether the influence owes to dissimilarity of the observation or the ill-conditioned data set. To aggregate the input–output space to decrease the number of vari-ables relative to the number of observations, methods proposed by Olesen and Petersen[22]or Pastor et al.[24]may be appropriate.

y1 y₂ O A B C _k E F G H I kwB kwoB kwoI kwI

Fig. 1. A two-output equal-input illustration for convex hall approximation and influential measurements.

We note that if few outliers are close to each other such that one outlier does not differ significantly from the rest with respect to any characteristic of interest, the approaches measuring the in-fluence of an observation's presence will have difficulty identifying this type of outlying data. This is termed the masking effect, and is stated and evident in many cases[30]. To eliminate masking, a com-bination of different observations should be removed in each stage with |R|

ⱖ

2 yielding corresponding influential measures. Consid-ering subsets allows heterogeneous subgroups within the analysis to distinguish themselves and they can be removed for a separate analysis.

2.3. Example

Fig. 1presents a two-output equal-input example. Consider an observation set S_{= {A, B, C, E, F, G, H, I, k}; the convex hull is ABFGIH.} Point k can be scaled up to kwI₍

S

k= (OkwI/Ok)) on the outer

bound-ary (HIG), and/or scaled down to kwB₍

S

k= (OkwB/Ok)) on the inner

boundary (ABF). The width of ray Ok in the convex hull (kwI_kwB_{) can}

be measured as

S_k−

S

k= (OkwI− OkwB)/Ok. If B is dropped from

S (R= {B}), then the distance to the outer boundary remains un-changed (

S\{B}_k _{= (Ok}wI_{/Ok)), while the inner boundary is shifted to}

ACF such that

S\{B}_k = OkwoB_{/Ok, and then the width is}

S\{B}

k −

S\{B}k =

(_OkwI_{− Ok}woB)_/Ok.

For k, the difference between the widths due to the existence of

B can be measured by using

o+i_k ({B}) = (OkwI_{− Ok}wB₎_{/Ok − (Ok}wI₋

OkwoB₎_{/Ok = (Ok}woB_{− Ok}wB₎_{/Ok. The inner boundary shift,}

i k({B}) =

S

k−

S\{B}k =(OkwB−OkwoB)/Ok, is measured. The outer boundary shift

is

o_k({B}) = (OkwI_{− Ok}wI₎_{/Ok = 0 since B does not affect the outer}

boundary. Similarly, when only observation I (R= {I}) is dropped, the inner boundary is the same, but the outer boundary changes to

HEG. The new width of the convex hull that is associated with k is

S\{I}

k −

S\{I}k = (OkwoI− OkwB)/Ok. The influence of I,

o+ik ({I}),

ok({I})

(5)

2.4. Input-oriented cases

Analogously, for any observation k the following apply in the input-oriented cases:

S k≡ min_,

: r∈S x_r

_r=

x_k; r∈S y_r

_r= yk; r∈S

r= 1;

r

ⱖ

0,r ∈ S , (9)

S\R k ≡ min_, ⎧ ⎨ ⎩

: r∈S\R x_r

_r=

x_k; r∈S\R y_r

_r= yk; r∈S\R

r= 1;

r

ⱖ

0,r ∈ S\R ⎫ ⎬ ⎭, (10)

S k≡ max_,

: r∈S x_r

_r=

x_k; r∈S y_r

_r= yk; r∈S

r= 1;

r

ⱖ

0,r ∈ S , (11)

S\R k ≡ max_, ⎧ ⎨ ⎩

: r∈S\R x_r

_r=

x_k; r∈S\R y_r

_r= yk; r∈S\R

r= 1;

r

ⱖ

0,r ∈ S\R ⎫ ⎬ ⎭, (12)

where k/∈ R. These equivalencies specify the relationship between k

and the corresponding convex hull boundaries. Applying the same argument to output-oriented cases, (10) and (12) are always feasible. Based on similar arguments addressed in Section 2.2, the measure of the effect on observation k due to R becomes

o k(R) ≡

Sk−

S\Rk , (13)

i k(R) ≡

Sk−

S\Rk , (14)

o+i k (R) ≡ (

Sk−

Sk)− (

S\Rk −

S\Rk ). (15)

In the input-oriented cases, the outer boundary is associated with the inefficient observations. The corresponding measures are given by (11) and (12), and the related difference is defined by (13). Since

S

k

ⱖ

S\Rk

ⱖ

1,

ok(R)

ⱖ

0. Similarly, (14) is related to the change of

the boundary that is closer to the origin, which is related to efficient observations in input-oriented analyses. 0

ⱖ

i_k(R)

ⱖ

− 1, because

S

k

ⱕ

S\Rk

ⱕ

1. Based on the arguments used in Section 2.1,

o+ik (R),

defined by (15), is the total change in the width associated with outlier candidate R, and combines the inner and the outer parts, such that

o+i_k (R) = |

o

k(R)| + |

ik(R)|.

Depending on the purpose of the analysis, either input- or output-oriented approaches should be adopted. If an input-output-oriented DEA model is selected to measure efficiency, an input-oriented influential measure should be used to avoid biased conclusions, and oriented influential measures should be selected when an output-oriented analysis is used to quantify efficiency. However, if the orientation of the analysis has not been determined, both metrics are recommended to fully explore the data set to discover unex-pected knowledge.

3. Case studies

This section applies the proposed model using four DEA cases. The first two are simulated cases and illustrate the effectiveness

Fig. 2. The scatter plot of case A.

Table 1

Ranking of outliers (case A, output-oriented).

|o_| _|i_| _Observation

ranked by Rank

Obser-vation

Tol. Avg. Obser-vation

Tol. Avg. Tol.

i+o Avg. i+o 1 102 15.16 0.344 13 5.48 0.057 102 102 2 59 3.51 0.080 53 5.01 0.209 13 53 3 101 1.39 0.034 79 4.88 0.066 53 87 4 103 1.35 0.025 87 0.097 0.097 79 59 5 30 0.53 0.045 59 79 6 101 13 7 103 30 8 30 101 9 87 103

through scatters of the data. The third case compares the model to earlier works via a common testbed, and the fourth identifies possible outlier suspects in a warehouse data set and shows their impact in a post analysis.

3.1. Case A – simulated bivariate case

Case A simulates a single-input single-output data set in which 100 observations are generated according to the function[30]

Y = X0.5_{· exp(−U)}

where X∼uniform(0, 1) and U is exponentially distributed with mean 1/3. Three extremely efficient outliers, 101, 102 and 103, are also added.Fig. 2plots all 103 data points.

Table 1 summarizes the outlier ranking (by total influence,

k∈S

∗k(R)) for an output-oriented analysis using the proposed

method2 _{; the associated points are also indicated in}_{Fig. 2}_{. The first} panel ofTable 1corresponds to the outer boundary, and only five observations affect this boundary, including the extremely efficient outliers 101, 102 and 103. The second and third columns present the total influence and the average influence (the total influence divided by the number of observations affected, respectively).

2_{The input-oriented analysis was calculated and similar results were developed.} The analysis is available upon request.

(6)

Output Oriented 0 2 4 6 8 10 12 14 16 18 20 0.1 - 0.2 Efficiency Score Frequency 0.2 - 0.3 0.3 - 0.4 0.4 - 0.5 0.5 - 0.6 0.6 - 0.7 0.7 - 0.8 0.8 - 0.9 0.9 - 1.0

Fig. 3. Histogram of BCC estimates from Scheel[26].

Similar to the first group, the second group ranks the total in-fluence associated with the inner boundary. Only four outliers can affect the inner boundary, verified inFig. 2. The third group ranks outliers by the changes in the total and average convex hull widths, which are the sums of inner and outer parts as specified by (8).

101, 102 and 103 are flagged as outliers; this result is consistent with the data generating process, in which 101, 102 and 103 are purposely added as efficient outliers. Further, the average influence of all listed observations, except for observations 102 and 53, does not exceed 0.1 (10%), and shows that the threshold is case-dependent if the gauging process is used to filter each new observation added to the data set. This observation is also pointed out by Simar[30]. Finally, the masking effect does exist as seen inFig. 2. Observations 59 and 53 are flagged as outliers, although they are extreme in scale and differ somewhat from 101, 102 and 103. As discussed in Section 2.4, both 59 and 53 have

S_k=

S

k=1 (unlike101, 102 and 103) and can

be easily classified as extreme in scale, but belonging in the analysis.

3.2. Case B – simulated bivariate case with empirical efficiency estimate distribution

Although it is necessary to identify the inefficient outliers, one can argue that it is easier and sufficient to check the empirical dis-tribution of the efficient estimates, flagging any observations below a defined efficiency threshold as outliers (referred to in this paper as the trimming method). The following example demonstrates that this simple idea cannot be applied effectively in some circumstances. Simulated efficiency estimates are commonly assumed to have exponential or half-normal distributions with significant tail[20]; a percentage of observations in the tail thus can be specified and flagged as outliers. However, DEA efficiency estimates rarely follow this pattern in many observed applications.Fig. 3plots the distri-bution of the output-oriented BCC efficiency estimates based on the data collected by Scheel[26], in which 63 observations each had four inputs and two outputs.3 _{The estimate distribution does not fit} ei-ther the exponential or half-normal distributions, and it is difficult to identify extremely inefficient observations.Fig. 3shows a clear gap in the distribution between 0.6 and 0.7. If a trimming approach is used and 0.6 could be selected as the efficiency level below which data is removed, then 26 observations will be removed. However,

3 _{Data set is available at} _{http://www.wiso.uni-dortmund.de/lsfg/or/scheel/}

doordea.htm.

Fig. 4. The scatter plot of case B.

Table 2

Ranking of outliers (case B, output-oriented).

|o_| _|i_| _Observation

ranked by Rank

Obser-vation

Tol. Avg. Obser-vation

Tol. Avg. Tol.

i+o Avg. i+o 1 43 4.77 0.795 20 2.91 0.052 43 43 2 34 0.715 0.089 31 1.77 0.047 20 15 3 25 0.648 0.043 15 0.62 0.310 31 56 4 56 0.193 0.096 9 0.21 0.011 34 34 5 33 0.080 0.007 25 20 6 55 0.027 0.002 15 31 7 30 0.022 0.022 9 25 8 56 30 9 33 9 10 55 33

Ranking of efficiency estimates from the bottom: 1, 20, 58, 24, 61, 40, 9, 15.

these observations are not necessarily distant from other observa-tions when mapped in input–output space asFig. 4illustrates.

Case B is a bivariate output-oriented case. Rather than fol-lowing the exponential distribution as in Case A, efficiency esti-mates follow the empirical distribution obtained from Scheel[26] (Fig. 3). Sixty-three points are generated according to Y= X0.5_{· E} where X∼uniform(0, 1), and output efficiency estimates E from Scheel's data are randomly assigned.Fig. 4displays the scatter plot of 63 points.Table 2ranks the outliers.

Seven observations influence the outer boundaries, but only ob-servation 43 has a strong impact on the other obob-servations with an average change of more than 0.795. However,Fig. 4reveals that observation 43 has an extreme scale and can be identified, since

S

43=

S43= 1, which is consistent with the data generating process. For the inner boundary, four outliers have inefficient output esti-mates. The bottom ofTable 2presents the observations ranked by lowest efficiency estimates. These observations are not necessarily the outliers through visual observation or through the proposed out-lier detection scheme.

This example shows that the proposed approach can detect ef-ficient and inefef-ficient outliers. In particular, it demonstrates that in cases of an empirical DEA efficiency estimate distribution, simply flagging the worst-performing observations as inefficient outliers can yield misleading results.

(7)

Table 3

Ranking of outliers (case C, input-oriented).

Tol Avg Fox 04

Rank i o o+i i o o+i _Mix _Scale _AD _W93 _W95

1 47 15 47 47 31 10 66 59 59 59 59 2 44 31 10 10 10 47 48 32 32 44 44 3 10 10 15 57 15 31 15 69 69 33 52 4 57 43 44 59 54 54 56 5 5 66 69 5 49 54 31 20 7 15 69 62 62 35 62 6 59 8 57 44 43 59 49 44 44 54 56 7 52 7 54 54 58 57 68 29 29 68 15 8 66 51 49 68 8 20 5 61 61 67 58 9 20 58 66 48 24 44 61 38 48 8 45 10 54 38 8 45 51 43 67 48 38 50 17

3.3. Case C – empirical multi-input and multi-output case

Data collected in Charnes et al.[7] are used as a multi-input multi-output example. These data, containing 70 observations each with five inputs and three outputs, constitute a common testbed for outlier detection studies. Input-oriented analysis is applied to compare the resulting ranks (for both total and average measures) against Wilson[31,32](W93 and W95, respectively) and Fox et al. [13](Fox04) (Table 3).

Wilson[32]measures the influence based on the change in the super efficiency estimates as defined by Andersen and Petersen[1]. Wilson[31]extends the outlier measure suggested by Andrews and Pregibon [2], which identifies as outliers those observations which

contribute the largest proportion of the volume of the full data set, to

the case of multiple outputs to examine the geometric properties of input–output data directly. Fox et al.[13]propose metrics measuring dissimilarity between any two input–output vectors in scale and mix aspects (and also the composition of scale and mix). Observations with highest summary dissimilarity are considered as outliers. W93 and Fox04 are listed in Fox et al.[13]and W95 uses total influence. In [32], observation 59 is undefined using super efficiency, because the scale of this observation is extremely large in input-oriented analyses or extremely small in output-input-oriented analyses. Fox et al.[13]also present evidence of this finding. However, unlike in other investigations, observation 59 is not ranked as a top outlier using the proposed method, because points such as 1, 21, 44 and 54 are now on the boundaries of the convex hull, such that 59 do not affect them. Hence, with fewer points affected, observation 59 has less overall effect on the entire data set.

The results obtained using various methods reveal some discrep-ancies. As Fox et al.[13]note, the outlier detecting schemes are re-lated to different aspects and produce different conclusions. Wilson [32]also suggests that more than one approach should be applied to detect outliers. The consistency among the conclusions based on different approaches is a useful index for prioritizing the data to be investigated: a data point is more likely to be an outlier when flagged by several methods. Further, the inconsistency among the different methods can suggest a direction for more study to better understand the data.

3.4. Case D – warehouse performance

In this case, warehouse performance data collected by Hackman et al.[17]are used to demonstrate the effectiveness of the proposed outlier detection scheme, and especially the influence of suspect records. There are 57 warehouse records, each having three inputs and five outputs; some warehouses have union while others do not. Eight records (1, 3, 6, 28, 35, 38, 46 and 50) are flagged as potential outliers based on (8) and a threshold 0.05. Three of the eight flagged observations are located on the inefficient frontier, but do not have

Table 4

Summary of warehouse performance comparisons.

Full data (57 records) Without outliers (49 records) Investment >$1 M ⱕ$1 M >$1 M ⱕ$1 M

Observation no. 36 21 29 20

Sample standard deviation 0.293 0.115 0.282 0.109

p-Value 0.147 0.0989

the lowest efficiency estimates; they rank third, 18th and 27th least efficient of the total records. This supports the insight that the out-lier detection method does not simply identify the records with the lowest efficiency estimates; rather it identifies the observations that most significantly distort the production possibility set.

To investigate the relationship between warehouse performance and capital investment, particularly the performance of those with equipment investment of more than $1 million, an output-oriented BCC analysis (BCC.O) is conducted. All warehouses are pooled to obtain their efficiency estimates.

The hypothesis test, originally applied in DEA by Banker[3], is used to test whether the two groups (greater than/less than $1 mil-lion of equipment investment) perform identically. The hypothesis test assumes that two groups with identical performance should have the same parameters of the efficiency estimate distributions. Banker suggests the use of a half-normal distribution, and the stan-dard deviation of the half-normal distribution completely character-izes the distribution, because the mean is zero by definition. There-fore, the hypothesis tested is

H0:

H=

Lagainst H1:

H

L

where

Hand

Lare the population standard deviations for the high

and low equipment investment warehouses, respectively.

Table 4 summarizes the efficiency estimates using the total records and removing eight potential outliers. Using all 57 records as peers, the standard deviation of efficiency estimates4 _for ware-houses with

>

$1 million investment is larger than those with

ⱕ

$1 million capital investment (0.293 vs. 0.115); the difference is statistically insignificant. The p-value is 0.147 using the full data set as the sample. Thus we fail to reject the null hypothesis at sig-nificant level 0.1 and can conclude that a warehouse's equipment level does not affect warehouse performance. After identifying and removing eight possible outliers, the difference in standard devia-tion of the two populadevia-tions is 0.183 (0.282 vs. 0.109). The p-value is 0.0989; we reject H0and can conclude that equipment level does affect warehouse performance.

The results show that this paper's proposed outlier detecting scheme identifies both efficient and inefficient outliers that can af-fect the analysis results and produce different conclusions. However, it is important to note that the finding flags the observations that are most dissimilar to the other observations in the data set as mea-sured by their influence on the EPPS, which suggests further investi-gation of this set of observations. Should added confirmation result in removing all observations, our results indicate an impact on the results of post analysis.

4. Computational remark

The computation procedure of

∗_k(R) is based on removing an

ob-servation (or a set of obob-servations) and calculating the influence on the remaining observations. This type of method requires a mas-sive computational effort, particularly when it is necessary to elim-inate the masking effect. We suggest a computation strategy that

(8)

will greatly reduce the computation time and that can be used to investigate the masking effect.

When |R|_{= 1 and all observations are tested as potential outliers,} there are 2×|S|×|S| linear programming (LP) problems to be solved, because for every observation we measure the influence on every other observation for both boundaries. In reality, we only have to calculate the effect that removing observations on a boundary has on observations that are not on the boundary. Without loss of gen-erality, for output-oriented analyses the optimal solution of the cor-responding LP problems (1) and (2) has at most |I|+|J|

's in the basis that are non-zero. Thus at least |S|−|I|−|J|

's are zero, and (3) and (4) will result in the same optimal values obtained by (1) and (2) when each of these observations is a removal candidate. This can be stated formally in the following propositions:

Proposition 2. For k∈ S and p ∈ S,

S

k=

S\{p}k if

∗p= 0 is the optimal

solution of (1) providing

S_k. Proof. See the appendix.

S

k=

S\{p}k if

S_k. Proof. See the appendix.

That is, only |I|+|J| observations affect a given observation k's ref-erence point on the boundary, so the removal of at least |S|_−|I|−|J| observations will not affect the outer boundary corresponding to

k∈ S. Therefore, computing

o

kin (6) does not require examining the

removal of all |S|_{−1 observations, but only the points with zero}

's in the optimal solution of (1). Identical arguments are made for

i_kin (7). The observations result in a simplified procedure that solves at most 2×|S|×(|I|+|J|+1) LP problems, and greatly reduces the number of LP problems to be solved (especially, |S| |I|+|J| which is typical). Further, we observe that observations on the outer (inner) bound-aries will not be affected by removal of any other observations when measuring the distance to the outer (inner) boundary. Namely, there is no influence on the outer (inner) boundary as measured through

k when

S_k= 1 (

S

k= 1). This can be stated formally by the following:

Proposition 4. For k∈ S,

S

k=1 implies

S\Rk =1 where R ⊂ S and k /∈ R.

Proof. See the appendix.

Proposition 5. For k_{∈ S,}

S_k_{=1 implies}

S\R_k _{= 1 where R ⊂ S and k /∈ R.} Proof. See the appendix.

Thus, it is not necessary to solve (3) and (4) regarding k, which satisfies the sufficient condition of Propositions 4 and 5, respectively. This observation further reduces the number of LP problems needed to be solved depending on data distribution. For example, if 20% of the data are on the outer boundary (

S_k= 1) and 15% of the data on the inner boundary (

S_k= 1), Propositions 4 and 5 can be applied to indicate that 0.2×|S|×(|I|+|J|)+0.15×|S|×(|I|+|J|) problem solving can be saved from 0.2×|S|×(|I|+|J|+1).

The procedure can be extended to cases with |R|

ⱖ

2. For exam-ple, there are (|S|_{−1)×(|S|−2) possible combinations of R (|R| = 2) for} each k, but Proposition 2 suggests that only (|I|+|J|)×(|S|−2) of them are needed for solving

S\R_k . Moreover, by integrating other methods, such as Chen and Cho[8]and Dula[11], which accelerate solving the single DEA problem, computational time can be further reduced. The proposed idea can also be applied to radial influence measures such as Pastor et al.[23]. Indeed, outlier detection increases the need

to accelerate DEA computations and provides an application for a variety of acceleration methods.

5. Conclusion

This paper presents an outlier detection method that ranks the importance of the outliers to be investigated based upon their in-fluence. Unlike previous outlier detection schemes, this method also identifies inefficient outliers that could impact post-efficiency esti-mation analysis. Where previous literature does not reconcile their approaches with the axioms of DEA, the method presented in this paper use the convex hull of the data by relaxing the free dispos-ability axiom and allows the detection of inefficient outliers. In the case studies presented, the proposed method effectively ranks out-liers and provides added information about their locations in the input–output space. The case studies demonstrate counter-examples to the intuitive misunderstanding that observations with poor effi-ciency estimates are more likely to be outliers. A real-world case also shows that outliers detected may lead to improper conclusions in post analysis based on DEA efficiency, such as testing the difference in efficiency of two populations using the Kolmogorov–Smirnov test. Moreover, we propose a strategy to reduce the computation time of outlier detection, and suggest that the strategy can be applied to other computational intensive influence measures such as suggested in Pastor et al.[23].

Acknowledgements

This research was supported in part by the National Science Coun-cil, Taiwan under Grant NSC 94-2213-E-009-078 and NSC 97-2221-E-009-113. The authors gratefully acknowledge the computing as-sistance of Mr. Chin-Chia Kuo.

Appendix.

S

k= 1 if

Sk= 1.

Proof. (1) is identical to (BCC.O) but with equalities for all con-straints. The feasible region of (1) is smaller than that of (BCC.O), and thus

S_k

ⱖ

S_k.

S_k= 1 leads to

S

k= 1 because

Sk

ⱖ

1.

Proposition 2. For k_{∈ S and p ∈ S,}

S_k₌

S\{p}_k if

∗_q_{= 0 is the optimal} solution of (1) providing

S_k.

Proof. The dual of (1) is min ui,vj,u0 i∈I uixik+ u0 s.t. i∈I uixir+ u0

ⱖ

j∈J viyjrr ∈ S, j∈J viyjk= 1. (D1)

To satisfy r∈ S

r= 1 in (1), there must exist q ∈ S such that

∗

q

>

0. According to complementary slackness, i∈Iu∗ixiq+ u∗0 −_j∈Jv∗

jyjq

∗

q= 0 where u∗i,u∗0 andv∗j are optimal solutions for

(D1), and thus_i∈Iu∗

ixiq+ u∗0−

j∈Jv∗jyjq= 0. The type 1 constraint

associated with q in (D1) is binding. It is clear that q

p, and

the optimal value remains the same when removing constraint

i∈Iuixiq+ u∗0

ⱖ

j∈Jv∗jyjq. Therefore,

Sk=

S\{p}k .

S

k=

S\{p}k if

S

(9)

Proof. Apply the same arguments for Proposition 2.

S

k=1 implies

Proof. It is clear that

S\R_k

ⱖ

1. When

S_k= 1 is the optimal value in (1),

∗_k= 1 and

∗

r= 0 for r ∈ S\{k} is the optimal solution (or one of

the optimal solutions) for (1). According to complementary slackness (_i∈Iu∗

ixik+ u∗0−

j∈Jv∗jyjk)

∗k= 0 where u∗i,u∗0 andv∗j are optimal

solution for the dual (D1), and thus_i∈I_u∗_i_x_ik_+u₀∗₋_j∈J_v∗_j_y_jk_{=0. The} type 1 constraint associated with k in (D1) is binding, and removing type 1 constraints associated with R in (D1) will remain the same objective value and cannot be better. That is (D1) without type 1 constraints associated with R will remain the same optimal value, and it is the dual of (3). Therefore,

S\R_k = 1.

S

k=1 implies

Proof. Apply the same arguments for Proposition 4.

References

[1] Andersen P, Petersen NC. A procedure for ranking efficient units in data envelopment analysis. Management Science 1993;39(10):1261–4.

[2] Andrews DF, Pregibon D. Finding the outliers that matter. Journal of the Royal Statistical Society, Series B 1978;40:85–93.

[3] Banker RD. Maximum-likelihood, consistency and data envelopment analysis – a statistical foundation. Management Science 1993;39(10):1265–73. [4] Banker RD, Charnes A, Cooper WW. Some models for estimating technical

and scale inefficiency in data envelopment analysis. Management Science 1984;30(9):1078–92.

[5] Barnett V, Lewis T. Outliers in statistical data. New York: Wiley; 1995. [6] Charnes A, Cooper WW, Rhodes E. Measuring the efficiency of decision making

units. European Journal of Operational Research 1978;2:429–44.

[7] Charnes A, Cooper WW, Rhodes E. Evaluating program and managerial efficiency: an application of data envelopment analysis to program flow through. Management Science 1981;27(6):668–97.

[8] Chen W-C, Cho W-J. A procedure for large-scale DEA computations. Computers and Operations Research 2009;36(6):1813–24.

[9] Chen Y, Liang L, Yang F, Zhu J. Evaluation of information technology investment: a data envelopment analysis approach. Computers and Operations Research 2006;33(5):1368–79.

[10] Chen Y, Gregoriou GN, Rouah FD. Efficiency persistence of bank and thrift CEOs using data envelopment analysis. Computers and Operations Research 2009;36(5):1554–61.

[11] Dula JH. A computational study of DEA with massive data sets. Computers and Operations Research 2008;35:1191–203.

[12] Farrell MJ. The measurement of productivity efficiency. Journal of the Royal Statistical Society 1957;120:377–91.

[13] Fox KJ, Hill RJ, Diewert WE. Identifying outliers in multi-output models. Journal of Productivity Analysis 2004;22:73–94.

[14] Fried HO, Lovell CAK, Turner JA. An analysis of the performance of university-affiliated credit unions. Computers and Operations Research 1996;23(4): 375–84.

[15] Gattoufi S, Oral M, Kumar A, Reisman A. Content analysis of data envelopment analysis literature and its comparison with that of other OR/MS fields. Journal of the Operational Research Society 2004;55(9):911–35.

[16] Hackman ST. Production economics: integrating the microeconomic and engineering perspectives. Berlin, Heidelberg: Springer; 2008.

[17] Hackman ST, Frazelle EH, Griffin P, Griffin SO, Vlatsa DA. Benchmarking warehousing and distribution operations: an input–output approach. Journal of Productivity Analysis 2001;16:79–100.

[18] Johnson AL, McGinnis LF. Outlier detection in two-stage semiparameteric DEA models. European Journal of Operational Research 2008;187(2):629–35. [19] Johnson AL, Chen W-C, McGinnis LF. Internet-based benchmarking for

warehouse operations. Working Paper, 2008.

[20] Kumbhakar SC, Lovell CAK. Stochastic frontier analysis. Cambridge, UK: Cambridge University Press; 2000.

[21] Muñiz M, Paradi J, Ruggiero J, Yang Z. Evaluating alternative DEA models used to control for non-discretionary inputs. Computers and Operations Research 2006;33(5):1173–83.

[22] Olesen OB, Petersen NC. Indicators of ill-conditioned data sets and model misspecification in data envelopment analysis: an extended facet approach. Management Science 1996;42(2):205–19.

[23] Pastor JT, Ruiz JL, Sirvent I. A statistical test for detecting influential observations in DEA. European Journal of Operational Research 1999;115(3):542–54. [24] Pastor JT, Ruiz JL, Sirvent I. A statistical test for nested radial DEA models.

Operations Research 2002;50(4):728–35.

[25] de Sousa MDCS, Stosic B. Technical efficiency of the Brazilian municipalities: Correcting nonparametric frontier measurements for outliers. Journal of Productivity Analysis 2005;24:157–81.

[26] Scheel H. Continuity of the BCC efficiency measure. In: Westermann G, editor. Data envelopment analysis in the service sector. Wiesbaden, Germany: Gabler; 1999.

[27] Seiford L, Fare R, Lovell CAK, Banker RD, Simar L, Forsund F. et al. Summary of some of the discussion at the Advanced Research Workshop on Efficiency Measurement, held at Odense University, May 22–24, 1995. Journal of Productivity Analysis 1996;7(2–3):341–5.

[28] Sexton TR, Silkman RH, Hogan AJ. Data envelopment analysis: critique and extensions. In: Silkman RH, editor. Measuring efficiency: an assessment of data envelopment analysis. San Francisco, CA: Jossey-Bass; 1986.

[29] Shephard RW. Theory of cost and production functions. Princeton, NJ: Princeton University Press; 1970.

[30] Simar L. Detecting outliers in frontier models: a simple approach. Journal of Productivity Analysis 2003;20:391–424.

[31] Wilson PW. Detecting outliers in deterministic nonparametric frontier models with multiple outputs. Journal of Business and Economic Statistics 1993;77(6):779–802.

[32] Wilson PW. Detecting influential observations in data envelopment analysis. Journal of Productivity Analysis 1995;6:27–45.

A unified model for detecting efficient and inefficient outliers in data envelopment analysis

Computers & Operations Research

A unified model for detecting efficient and inefficient outliers in data

envelopment analysis

Wen-Chih Chen

∗

, Andrew L. Johnson

ⱖ

ⱕ

ⱕ

ⱖ

ⱖ

ⱕ

ⱖ

ⱖ

ⱕ

ⱖ

ⱖ

ⱕ

ⱖ

ⱖ

ⱕ

ⱖ

ⱖ

ⱖ

ⱖ

ⱕ













ⱖ





ⱖ



 > 

























ⱖ

ⱕ 

ⱕ



 < 









































ⱖ







_{, Andrew L. Johnson}

_ⱖ

_ⱕ

_ⱖ

_ⱕ

>

ⱕ

<