Design and
VLSI
Implementation of Real-Time
Weighted Median Filters
Chun- Te Chen, Liang-Gee Chen, Tzi-Dar Chiueh, and
Jue-Hsuan Hsiao
Department
of Electrical Engineering
National Taiwan University, Taipei, Taiwan,
R. 0.
C.
Abst
sact
In this paper, a novel design method and VLSI imple- mentation of weighted median filters is presented. The weight parameters are incremented t o minimized the mean square error of the input signal and desired signal when the sub-set of the image d a t a is applied. The sub- optimal(optima1) weight set is achieved after 71 training cycles for 5*5 window size under the MSE criteria. A
two-level adder tree architecture with pipelined latches is proposed t o find the weighted median output with ver- satile definition of weight. It requires only K iterations to find output, where K means the resolution of the in- put signal samples. T h e task interleaved processing is also adopted t o improve the throughput. The final chip layout with pipelined latches for 5*5 window size is also given in this paper. This high-speed VLSI implemen- tation of weighted median filter will meet the real-time application requirements.
The Weighted Median Filter(WMF) is first introduced by Justusson[l] and further discussed by Brownrigg[2] and Wendt eta2[3]. This filter gives more weight t o some samples within t h e sliding window t o allow a degree con- trol of the smoothing behavior. T h e sample in the slid- ing window is duplicated as weight define before sorting; then the center value of the sorted list is chosen as a me- dian output. T h e total weight sum is equal toT,,,, where T, is a n odd positive integer. These weighted median fil- ters are of fundamental important since they are a large number of extensively used filter types, such as order s- tatistic filters, Center Weighted Median Filters(CWMF) and standard Median Filters(MF). In the center weight- ed median filter proposed in [4], they give more weight only t o the center value of the window, and thus it is easier t o design and implement than general weighted median filters. However, it is too sensitive to the charac- teristic of the image with the impulsive noise. Recently, the adaptive stack(weighted) filter[5,6] techniques have
been proposed for enhancing image degraded by noise. These adaptive stack filters allows more parameters t o be adjusted not only center parameter. By adjusting their weights depending the characteristic of the input image, these filters can outperform the median filter and the center weight median filter as showns [6]. The detail of the image is preserved and removed impulsive noise efficiently. The optimal weight set is determined by lin- ear program and some advance algorithmas presented in [5,6]. It is more interesting t h a t this estimated approach alleviates modeling of the signal and noise by taking a
part of the input signal t o train the weight filter. How- ever, the problem with these methodology is that the number of parameter in the linear program grows expo- nentially in the window width of the filter. On the other hand, the existing approaches have a slow convergence speed since the adaptation does not utilize recent weight coefficients. Hence, we proposed a n efficient method to determined optimal weight set with faster convergence speed.
Real-time implementation of a weighted median filter involves more computation than standard median filters since the number of samples t o be sorted is increased. In the sorted network implementation of weighted median filters[7], each sample must be duplicated many times
as weight define before applied t o sorted network. The input size of network is increased. Therefore, it consist-
s of a large number of compare-swap units made these
unattractive. An alternative implementation of weight- ed median filters is represented in stack filters. The s-
tack filters, introduced by Wendt et a1[3] has used posi- tive boolean functions with O(2') a r e a t i m e complexity to perform binary filtering on unit-weight signal. The chip was designed t o perform any rank-order operation with maximum window width 9 on 6-bit input data. O.Y. Harja etal.[8] has been extended the P B F to re-
alized the weighted median operation with some limi- tation. Chen[9] proposed bit-serial tree-search method t o reduce the hardware complexity t o only one
PBF.
This approach can only support smaller window size, s- ince the hardware complexity will be increased for larger window size. Generally, t o perform a median on 25 da-
4C.5.1
Weighled Me" S(n) lnpu1 Signal Filter I ou1put
Figure 1: (a) T h e system block of weighted median filter definition. (b) T h e parameters of filter definition before applied t o training algorithm.
W l l WIZ WIZ W14 W15 W21 WZZ W23 W24 W25 W31 W32 W33 W34 W35
w 4 1 w42 w 4 3 w44 W45
W51 WS2 W53 W54 W55
t a (5*5 window size), it need 5,200,300 items for logic OR operation and each item has 13 parameter for logic
AND
operation. T h e other limitation of PBF is that the weights set is fixed. This method requires huge connec- tion network bandwidth for versatile definition of weight. Hence, we proposed a n efficient VLSI implementation of the weighted median filters, whose weight are adjustable to support standard median filters, center weighted me- dian filters, order statistic filters and weighted median filters. This paper is organized as follows. Section 2, atraining algorithm and design method for weighted me- dian filters is introduced. Section 3 describe the architec- ture of the weighted median filters. The final conclusion is given in section 4.
Order
" " W " "
'ymmeVic
2
A
Design Method
for
Weight-
ed Median Filter
In this section, we proposed a training algorithm, which has a faster convergence rate t o determine the weight in the filters. T h e weighted median filters is state as a optimal filtering problem as shown in Fig. l(a). The process R(n) at the input of the filters is assumed to be
a corrupted version of some desired process S(n). The corruption may be caused either by a noise process N(n) or by some intentional operation, such as a modulation scheme. T h e goal is t o adjust the weight of the filter such that the average MAE or MSE per time unit between the filter's output and t h e desired signal is minimized.
T o reduce computation,'the class of weighted median filters whose weights are symmetric about the window
center is discussed as in [2]. Hence, the number of the parameter t o be adjusted is reduced almost 1/4 of the unsymmetric weighted median filters. T h e number of distinct filters is limited t o only 53 types for 3*3 window size when the Brownring approach[2] is applied. The optimal weight set is found by exhausted search from this 53 distinct types. However, when this approach is extended for 5*5 window size, the number of the distinct filters is increase t o over 5000. Therefore, we can not use this approach t o determine the weights when the window size is larger than 9. However, t h e problem with those methodologies in [5,6] is two complex t o implement. And the convergence rate is too slow in real-time applications. Hence, we proposed a n efficient method, which is simple in VLSI implementation t o determined optimal weight set with faster convergence speed.
The main idea of the proposed algorithm is described as follow paragraph. T h e MSE or
MAE
curve of the training image d a t a depicted a s a function of the cen- tral weight is convex and has a unique minimum value at a certain weight in the center weighted median filter-s . This result has been presented in [4]. Assume this
property is also maintained in the generalized weighted median filters case. T h e cost function (MAS or MSE) became as a function with multiple parameters and its cost function is convex and has a unique minimum val- ue. The initial value of the weighted median filter is set equal to 1 as that of standard median filter. The cen- tral value is adjusted a t first; therefore, the performance of this filter will superior t o the standard median filter. Then the weight set is stored in register for final deter- mination. The parameter next closet t o the center is ad- justed after the central value is determined. When the next closed parameter is incremented, the central val- ue must also be incremented t o ensure t o minimize the function with these parameters. T h e cost function is not always less than the previous result. We are forced to accept this result until the minimum of the cost function is reached. This minimum MAE or MSE must be com- pared with the stored results t o determine which weight set is better. Then t h e parameter third closet the cen- ter is adjusted. And the same procedure goes on. The weights will be only increased a t each traing cycle to improve performance since the previous weight set are stored for restoration. T h e performance of this filter will also superior to t h a t of the center weighted median filter since the weight set with minimum MSE or MAE will be accept in train procedure.
T h e upper left quarter of t h e Lena image with noise was used t o train t o get sub-optimal weight set for 5*5 window size. The noise image is generated by corrupted Gaussian noise with zero mean and variance 30 t o orig- inal Lena image. The probability of impulsives is 0.1. The training curve of the proposed design under MSE
The convergece r a t e of the proposed algorithm
"TA mse' -
3 0 4 0 5 0 6 0 No of i t e r a t i o n s
The convergece rate of t h e proposed algorithm 3 6
'D*T* mac. -
Figure 2: The sub-set of image data with noise is ap- plied to train to get optimal weight set. The convergence curve of proposed algorithm under MSE (or MAE) cri- terion.
or MAE is shown in Fig. 2. The sub-optimal (optimal) weight set is achieved after 71 training cycles for MSE criterion or 75 training cycles of MAE criterion. The performance of these sub-optimal weight set for full size Lena image can outperform median filters and center weighted median filters as shown in Table 1.
3
Proposed
Architecture
for
Weighted Median Filter
A detail architecture design for weighted medain filters is discussed in this section. At first, we introduce a nov- el pipelined method for finding median of the weighted medain filter. And then its architecture with task in- terleaving is adopted to improve throughput. The finer bit-level is also considered in this section.
with ~ a ~noise s s ~ ~ ~ =30) and impulsive noise
eration is t o find a mm- imum boundary value ( W k ) in the sl
divided the original sample in the 8
W O sub-set, So and SI, The samples
larger than
(wk)
are put intoSI,
otherwise ibey are put in So. And the associated wer than M=(l;$,+1)/2, where
hen the second Isrger
e partition, sild so on
m o f t h e sample in S I
output of the weight- ed median filter is the sample correspo
The computational c ence, an efficient me
d evaluate the wEight sum is requir for real-time implementation. A radix search method, not directly finding the maximum sample is proposed to find the boundary value as
method. The most significant bit ary value is set equal to 1. dow are divided into two this boundary value. The sum associated in the SI
or equal to M value, the correct output of the weight- ed median filter will be "l", otherwise "0". The whose MSB bit with 0 are belong t o So partition beled them. On the other hand, if the weight sum is less
than M, the sample whose MSB bit with 1 are belong t o
SI partition and labeled them. When the labels of the samples are determined, they will keep these labels until the final bit of the boundary bit is determined. The sec- ond MSB of boundary bit is also determined as the same manner. After K iterations, the final boundary value is determined for K-bit input data. The a r e a t i m e com-
plexity of the proposed realization is O(M), as compare
WE.]= [ 1 2 3 2 1
I
X=[.]=[4 7 5 11 9]=[0100 0111 0101 1011 1001] Tw=9, M=5, k=4, WM[.]=(.. . . .
] -> 0 +O+
0 +2 +1 = 3 <5 Y(4)=0, K=3, WM[.]=[.. .
1 11 -> 1 + 2+
3 +2+1 =9>=5 Y(3)=1, K=2, WM[.]=[.. .
1 11 -> 0 +2 +0+2+1=5>=5 Y(2)=1, K = l , WM[.]=[O.
0 1 11 -> 0+2+0+2+1 =5>=5 Y(l)=l, .<:: ... ... ... ...A
A
1
The weighted median filter output= Y(Ol11)=7,
Figure 3: An example the weighted median filter
to Q(N) for direct implementation. An example of the weighted median filter is shown in Fig. 3. The weighted median filter output is achieved after K ( = 4 ) iterations.
3.2
A Architecture
of
weighted median
finding
In the boundary value finding, the weight sum operation is a higher computation task. In the traditional P B F im- plementation, it requires not only larger number of AND or O R gates but also hards t o divide into many pipelined stages t o improve throughput. Hence, a two-level tree architecture with pipelined latches to sum the corre- sponding weights is proposed. The smoothing of the d a t a flow is also consider, otherwise the extra hardware will be induceed. We used many row adder tree (RAT) modules and one column adder tree (CAT)module as shown in Fig. 4 t o realized proposed algorithm for 2-D case. T h e output of the W M module are in parallelism multiplied with associated weights before applied t o the tree adder. T h e outputs of RAT modules are down to CAT module t o evaluated the final results. This final result is added by the 2's complement of median value, M. T h e carry-bit of this final adder is used t o determine associated median output and update associated label matrix. In this architecture, all operations are executed from the LSB (lest significant bit) t o MSB. The weighted median output is come out from the MSB to LSB.
The input word d a t a must be converted into bit-serial for weighted medain finding. Here, we introduce a new method combined with task interleaving processing as shown in Fig. 6. T h e timing relationship of the input
U U U U ' U
Figure 4: A two-level tree architecture of weighted me- dian filter
data, interleaved task, and bit-serial output is also shown on the bottom of this figure. For 5*5 window size case, there are 5 number of bit stream are generated from this d a t a arranged module t o drive the W M module as shown in Fig. 5.
In the WM module as shown in Fig. 6, labels of every tasks are stored in a ring-register. I t will shift one posi- tion per clock cycle corresponding t o proper interleaved task. The label of sample in t h e ring-register will shift. one position per cycle t o evaluate other sliding window. The input d a t a of WM module are arranged in bit par- allel by its left d a t a rearrange module and it was passed to its right WM module
For more flexibility, t h e weight registers in WM mod- ule are connected in cascade as a shift registers. The new weight set are loaded from a signal input port in sequence.
It
has 2*(Iog2N+
2) number of registers to store the label of interleaved task, where N means the window size of 2-D filter. T h e independent tasks are in- serted interleavely into the waiting stages. T h e process of K iteration to find a single output is defined as a single task. We can executed 8 tasks with the 5 line of inputs streams, for 2-D weighted median filter with 5*5 window size. And there are 320(=8 bits* 8 tasks* 5 streams) reg- ister to implement the word-parallel/bit-serial with task interleaving function.Layout of the chip is display in Fig. 7 for 5"s win- dow size. And the total maximum weight s u m is lim- ited t o 255. It takes about 36 pins and 2,491 number of standard cells, which is less than that of P B F im- plementation (5200300). The active area of weighted medain finding is about 3.91
*
3.91mm2. The function of this chip is verified by Verilog Simulator. And the-1
8 shift registers WIBS X11[7] X11[6] X11[1] Xll[O] X21[7] X21[6] X21[1] X21[0] I I7-
9 cycles I I I I I I I I I I I I I I I I I I I I(a) Word input/Bit-serail with task interleaving
ROW 1 ROW 2 ROW 3 ROW 4 1-to-5 demultiplier 1 cycle x 1 1
‘
[ O ]1
X12[01 -f-r
X13[0] X14[0] 9 cycles XlS[O]li
(b) Input-Shared Shift Register Module
Figure 5: a) Architecture of Word input/Bit-serial con- verter with task interleaving. b) T h e architecture of Shift Register with input-shared.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ - - - . - - - -
. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ I
Figure 6: The architecture of
WM
module with a ring- register for task interleaving process.Figure 7: The final layout of the weighted median filter.
critical path or pipelined latency(8-bit adder) is about 16ns for 0.8 micron CCL standard cell library. T h e pro- posed chip can support weighted median filters, stan- dard median filters and order statistic filters since the weight set are down loaded from the external of chip. Compared with Positive Boolean Function in [9], the proposed two-level tree architecture has less number of logic gates even with pipelined latches. This architecture is very regular t o pipeline in bit-level operation. T h a t is because the weight s u m operation and 2’s complement M adding may all execute from LSB to MSB. Hence, we can insert 16 tasks(more than 8) with some modification.
The pipelined cycle is reduced to only one bit csrry-save- adder(CSA). This bit-level pipelined architectdl
;,
which is extended the current word-level version can supporthigher speed applications.
4
Conclusions
A design method t o train the weighted median filter and its VLSI implementation are presented in this paper. The sub-optimal weight set is achieved rapidly with pro- posed traing algorithm. In the experiment results, the curve of MSE is reduced very fast in very short training cycle and become smoothing. Hence, the training cy- cle can be reduced t o get acceptable result. The detail of the image is preverved efficiently, which is overcome the smoothing output of t h e center weighted median fil- ter. T h e arithmetic method for weighted median filter is also proposed. I t requires K iterations of the weight sum operation t o complete the full operation of median finding. T h e adder tree architecture is adopted to reduce each cycle time. By doing task interleaving, the adjacent tasks can be mutual-exclusively interleaved in order t o fill the pipeline 100%. The throughput is increased as the number of tasks inserted. The proposed design is regular t o pipeline in bit-level for high-speed applica tions. T h e final chip layout for generalization weighted median filter with 5*5 window size is also given in this paper. For non-integer weight set, they can either nor- malized or shift t o be integer without loss their gener- ality. Obviously, this chip can support standard median filters and other adaptive stack filters[5,6]. In the future, the bit-level pipelined architecture of the medain finding operation will be also realized in VLSI implementation. And its real-word applications will be developed based on the proposed architecture and the training algorithm.
References
[l] B.I.Justusson, ” Median filtering: Statistic prop- erties,” in Topic in Applied Physical, Two- Dimensional Digital Signal Processing 11, T. S.
Huang, Ed. Berlin: Springer, 1981.
[2] D. R.
K.
Brownrigg, ” The weighted median filter,”Commun. Assoc. Comput. Mach.
,
vol. 27, pp. 807-818, Aug. 1984.
[3] P.D.Wendt, E.J.Coyle, and N. C. Gallagher, ”Stack Filters,” IEEE Trans. o n Acoust.,Speech, and Sig- nal Processing, vol. 34, pp. 898-911, 1986.
[4] S. J. KO and Y. H. Lee, ”Center Weighted Median Filters a n d Their Application t o Image Enhance- ment,”IEEE Trans. o n Circuits a.$ $stems, vol. 38 pp. 984-993,1991
[5] J.
H.
Lin, T. M. Sellke, and E. J . Coyle, ”Adaptive Stack Filtering Under the Mean Absolute Error Cri- terion,”IEEE
Trans. on Acoust. ,Speech, and Signal Processing, vol. 38, No. 6, pp. 938-954, 1990[6] L. Yin J . T. Astola, and Y . A.Neuvo, ”Adaptive Stack Filtering with Application t o Image Process- ing,”, IEEE IFans. o n Signal Processing, vol. 41, pp. 162-184, 1993.
[7] M. Karaman, L. Onural, and A. Atalar, ”Design and Implementation of a General-purpose Medi- an Filter Unit in CMOS VLSI,”, IEEE Journal of Solid-State Circuits, vol. 25, No. 2 pp. 505-512, 1990.
[8] 0. Yli-Harja, J. Astola, and Y. Neuvo, ”Analysis of the Properties of Median and Weighted Medi- a n Filters using Threshold Logic and Stack Filter Representation,”, IEEE Trans. on Signal Process-
ing, vol. 39, pp. 395-410, 1991.
[9] K. Chen, ”Bit-serial Realizations of a class on Non- linear filter based on Positive Boolean Function,”