Parallel B-Spline Surface Fitting on Mesh-Connected Computers

(1)

sors. Later, some cost-optimal parallel solvers were pre-sented, where cost is defined to be the product of time and the number of processors used [19]. Using a newly proposed computational model called the mesh-of-unshuf-fle network, Chung and Lin [7] presented an O(log m log n)-time parallel algorithm using O(mn/(log m log n)) processors. On a hypercube network, a similar result was presented by Lin and Chung [20]. Using the pipelining strategy [11], an O(log n9)-time parallel algorithm was presented on the same network with O(mn/log n9) proces-sors [13], where n9 5 max(m, n). Cheng et al. [6] presented the first constant-time parallel algorithm on a mesh-con-nected computer with O(mn) processors based on the Chebyshev iterative method; they also gave the corre-sponding relative error analysis. Note that their algorithm [6] is time- and cost-optimal.

Based on the matrix perturbation method, this paper first presents a novel approximate O(n/p)-time parallel B-spline curve fitting algorithm for finding the corresponding

n control points that interpolate those n data points on a

linear array processor with p processors, where p # n. Given m 3 n data points, we then present an O(mn/ ( p1p2))-time parallel algorithm for B-spline surface fitting on a p13 p2mesh-connected computer, where p1# m and

p2# n. The relative error analyses of our two stable and cost-optimal parallel solvers are also given. When setting

p15 m and p25 n, a time- and cost-optimal parallel solver can be derived; our result is a direct method vs. the result of Cheng et al. [6].

2. PARALLEL B-SPLINE CURVE INTERPOLATOR Suppose we are given a set of 3-dimensional data points,

Bi 5 (b (1) i , b (2) i , b (3) i ) for 1# i # n. According to [2, 15],

for uniform B-spline curve, each data point can be ex-pressed by a weighted average of three control points:

Bi 5 Ah(Ci21 1 4Ci 1 Ci11), 1 # i # n, where Ci 5

(c(1)_i , c(2)_i , c(3)_i ). They form a system of n equations in n1 2 unknowns for all given points. In order to completely solve the system, we need the following two additional equations to specify how the boundary control points are interpolated: C05 C1; Cn115 Cn. For simplicity, we only

consider b 5 (b1, b2, ..., bn)T 5 (b(1)1 , b (1) 2 , ..., b (1) n )T and c 5 (c1, c2, ..., cn)T 5 (c(1)1 , c (1) 2 , ..., c (1) n )T throughout this ARTICLE NO.0082

Parallel B-Spline Surface Fitting on Mesh-Connected Computers

KUO-LIANGCHUNG*,1AND WEN-MING YAN†,2

*Department of Information Management, National Taiwan Institute of Technology, No. 43, Section 4, Keelung Road,

Taipei, Taiwan 10672, Republic of China; and †Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan 10764, Republic of China

0743-7315/96 $18.00 Copyright1996 by Academic Press, Inc. All rights of reproduction in any form reserved.

205

The solution of uniform bicubic B-spline curve/surface fitting problem is considered. Based on the matrix perturbation method, this paper first presents a novel approximate O(n/p)-time parallel B-spline curve fitting algorithm for finding the corresponding n control points that interpolate those n data points on a linear array processor with p processors, where p # n. Given m 3 n data points, we then present an O(mn/

( p1p2))-time parallel algorithm for solving the uniform bicubic

B-spline surface fitting problem on a p13 p2mesh-connected

computer, where p1# m and p2# n. The relative error analyses

of our two stable and cost-optimal parallel solvers are also given. When setting p15 m and p25 n, a constant-time parallel

solver for B-spline surface fitting can be derived; this time- and cost-optimal result is a direct method, in contrast to the parallel iterative method of Cheng et al. (Parallel B-spline surface inter-polation on a mesh-connected processor array, J. Parallel Dis-trib. Comput. 24, 2 (1995), 224–229). 1996 Academic Press, Inc.

1. INTRODUCTION

Surface fitting is very important in the fields of CAD, CAM, robotic path planning, graphics, pattern recognition, and image processing. Due to the local change property, B-spline surface interpolation is a good fitting tool to con-struct a smooth surface that fits those given points in the space [15, 2]. From the viewpoint of filters, finding the control points can be solved by using an inverse filtering operation [17].

Suppose we are given a set of m3 n data points to be interpolated. Based on the Gauss–Seidel iteration method, Ajjanagadde and Patnaik [1] presented the first parallel algorithm to perform uniform bicubic B-spline surface fit-ting in O(n1 p) time using O(pn) processors, where p is the number of iterations specified by the user. Then Cheng and Goshtasby [4] presented another parallel iterative solver based on the SLOR method to find the control points in O(m log n) time using O(n) processors. Based on the cyclic reduction method [18], Cheng and Goshtasby [5] presented the first logarithmic-time parallel algorithm, which takes O(log m1 log n) time using O(mn)

proces-1

E-mail: [email protected].

2

(2)

which implies that a2 b 5 4 and 2ab 5 1; this then implies that b5 21/a and a 5 2 6 Ï3. Since we wish the matrices

L9 and U9 to be diagonally dominant, we will select the

sign so that the absolute value of a is greater than 1. There-fore, let a5 2 1 Ï3 and b 5 21/a 5 Ï3 2 2.

The computation of a and b provides the Toeplitz factor-ization of matrix A9q(phase 1 in our algorithm), which can

be computed in parallel by each processor in O(1) time on the array processor.

2.2. Phase 2: Substitution Procedure

Initially, we partition b into p parts and b 5 (b0T_{, b}1T_{, ..., b}p21T₎T_{, where b}i_{5 (6b}

iq11, 6biq12, ..., 6biq1q)T

for 0 # i # p 2 1. The ith processor first solves A9qzi5

bi_{, 0}_{# i # p 2 1, where z}i_{5 (z}

iq11, ziq12, ..., ziq1q)T, that

is, the ith processor solves L9qU9qzi5 bi. It can be verified

that each processor in the linear array processor needs about 5q time to perform the forward and backward substi-tution procedure for solving L9qU9qzi5 bi (second phase

in our algorithm) sequentially [16]. Consequently, it follows that

Anz2 b 5 gpen1 hpe11

O

p21 i51 (gieiq1 hieiq11), (4) where gi5 ziq11, hi5 ziq1 (2 2 Ï3)ziq11, 1# i # p 2 1, (5) gp5 zn, hp5 (3 2 Ï3)z1. (6)

By (4), the solution of c in (1) will be determined approx-imately, say, z2 p, in Section 2.3 later, where Ap P gpen1 hpe11

o

p21

i51( gieiq1 hieiq11). To estimate the bound of the relative error, we need the following two auxiliary results. Herei ● i denotes the infinity norm of a vector.

LEMMA1 [13]. If Ax5 b, where A 5 [aij]n3nis a diago-nally dominant matrix, then ixi # (1/min1_#i#n (uaiiu 2

o

n

j51, j?iuaiju)) ibi.

For A9 in (2), because min1#i#n(uaiiu 2

o

n

j51, j?iuaiju) 5 2,

we have the immediate result.

COROLLARY2 [13]. If A9x9 5 b9, then ix9i # Asib9i.

From A9qzi5 bifor 0 # i # p 2 1 and by Corollary 2,

izi_{i # Asib}i_{i # Asibi.} ₍₇₎

2.3. Phase 3: Update Procedure

From (4), the solution of c in (1) will be determined approximately in this section and only local communica-section. Thus, the above system of linear equations can be

equivalently transformed into

Ac5 6b, (1) where A5

1

5 1 1 4 1 ? ? ? 1 4 1 1 5

2

.

Our parallel solver consists of three phases called the Toeplitz factorization, the substitution procedure, and the update procedure, respectively. Suppose we have a linear array processor with p processors. Figure 1 shows a linear array processor with four processors, where the processor identity inside the square box denotes the address of the processor. For convenience, suppose n is a multiple of the number of processors used, i.e., n 5 pq. Throughout the remainder of this section, matrices are represented by up-percase letters, vectors by bold lowercase letters, and sca-lars by plain lowercase letters. The superscript T corres-ponds to the transpose operation. We now show how each phase can be accomplished on the linear array network in a highly parallel way.

2.1. Phase 1: Toeplitz Factorization

Let A9q5

1

a 1 1 4 1 ? ? ? 1 4 1 1 4

2

q3q 5 L9qU9q, (2) L9q5

1

1 2b 1 ? ? 2b 1 2b 1

2

q3q , and U9q5

1

a 1 a 1 ? ? a 1 a

2

q3q , (3)

(3)

tions between adjacent processors on the linear array pro-cessor will be needed. Then, the bound of the relative error will be analyzed.

From a2 b 5 4 and 2ab 5 1, we have 1 1 4b 1 b2₅ 0. Let pk 5 (0, ..., 0, 1, b, ..., b

  



t, 0, ..., 0)





T and qk 5

k n2k

(0, ..., 0, bt_{, ..., b, 1, 0, ..., 0)}T _{for q} _{# k # n 2 q 1 1.}







  

k n2k

Here we assume that q . t. In addition, we let p1 5 (1, b, ..., bt21_{, b}t_{, 0, ..., 0)}T_{and q} n5 (0, ..., 0, bt, bt21, ..., b, 1)T_{. Accordingly, we have} Apk5 ek211 (2 1 Ï3)ek1 bt(2bek1t1 ek1t11), (8) Aqk5 bt(ek2t212 bek2t)1 (2 1 Ï3)ek1 ek11, (9) Ap15 (3 1 Ï3)e11 bt(2bet111 et12), (10) Aqn5 (3 1 Ï3)en1 bt(en2t212 ben2t). (11)

By (8), (9), (10), and (11), for 1 # i # p 2 1, we have

A(uipiq111 viqiq)5 (ui1 (2 1 Ï3)vi)eiq

1 ((2 1 Ï3)ui1 vi)eiq11 1 uibt(2beiq1t111 eiq1s12)

1 vibt(eiq2t212 beiq2t), (12) A(upp11 vpqn)5 (3 1 Ï3)vpen1 (3 1 Ï3)upe1

1 upbs(2bet111 et12)

1 vpbt(en2t212 ben2t). (13)

Let ui1 (2 1 Ï3)vi5 giand (21 Ï3)ui1 vi5 hifor

1 # i # p 2 1; (3 1 Ï3)vp5 gpand (31 Ï3)up5 hp, by (5) and (6), ui5 gi2 (2 1 Ï3)hi 26 2 4Ï3 5 (21 Ï3)ziq 61 4Ï3 , vi5 hi2 (2 1 Ï3)gi 26 2 4Ï3 5 2Ï3ziq112 ziq 61 4Ï3 , 1# i # p 2 1, (14) up5 hp 31 Ï35 32 Ï3 31 Ï3z1, vp5 gp 31 Ï35 zn 31 Ï3. (15) Let p5 (upp11 vpqn)2

O

p21 i51 (uipiq111 viqiq), (16) by (12), (13), (14), and (15), we have Ap P gpen 1 hpe11

o

p21

i51 (gieiq 1 hieiq11). Recall that the solution of c in (1) is determined approximately by c5 z 2 p; then we have the bound of the corresponding relative error.

THEOREM 3 [13]. Let c5 A21b; i.e., let c be the exact

solution of Ax5 b; then

iAnc2 bi # 0.173ubtu ibi and i

c2 ci

ici # 0.519ubtu.

The above three-phase parallel algorithm for solving (1) can be written as follows, where the ith, 0 # i # p 2 1, processor in the linear array processor returns the partial solution vector ci_. ALGORITHM1 FOR i :5 0 TO p 2 1 DO IN PARALLEL (1) Solve Azi_{5 b}i_; (2) Evaluate ui, vi11; (3) ci_{r z}i_{2 u} i(1, b, b2, ..., bt, 0, ..., 0)T2 vi11(0, ..., 0, bt_{, ..., b}2_{, b, 1)}T_; END

Following the above algorithm, the parallel pseudo code performed by processor (node) i, 0# i # p 2 1, is referred to [13]. Since each processor wants to solve a small linear system of size O(n/p) and only local communications be-tween adjacent processors on the linear array processor are needed, we have the following result.

THEOREM4. Given n data points, the uniform B-spline

curve fitting problem can be solved in O(n/p) time on the linear array processor with O(p) processors; the relative error is#0.519(2 2 Ï3)t_{for t}_{, n/p.}

When setting p5 O(n), a constant-time parallel algorithm for B-spline curve fitting is obtained. However, it is a trade-off between the time needed and the relative error desired. We have presented the parallel B-spline curve fitting on the linear array processor. The next section will extend this result to solve the B-spline surface fitting on the mesh-connected computer and derive the corresponding rela-tive error.

3. PARALLEL B-SPLINE SURFACE INTERPOLATOR Suppose we are given a set of 3-dimensional points,

Bi, j5 (b(1)i, j, b

(2)

i, j, b

(3)

i, j) for 1# i # m, 1 # j # n. The uniform

bicubic B-spline surface for interpolating these m3 n data points consists of (m2 1) 3 (n 2 1) patches Si, j(u, v) for

0 # u, v , 1, 1 # i # m 2 1, and 1 # j # n 2 1. Each patch is defined by a bicubic polynomial si, j(u, v) 5

(1/36)[u3_{, u}2_{, u, 1] NQ} i, jNt[v3, v2, v, 1]t, where N5

1

21 3 23 1 3 26 3 0 23 0 3 0 1 4 1 0

2

and Qi, j5

1

Ci21, j21 Ci21, j Ci21, j11 Ci21, j12 C_{i, j2}1 Ci, j Ci, j11 Ci, j12 Ci11, j21 Ci11, j Ci11, j11 Ci11, j12 C_i12, j21 Ci12, j Ci12, j11 Ci12, j12

2

,

(4)

Since B can be factorized into B 5 CD, where C5

1

An An . . . An An

2

m3m and D5

1

5In In In 4In In . . . . . . . . . In 4In In In 5In

2

m3m ,

the above system, Bc5 b, can be transformed into

An3nhkil5 bkilfor 1# i # m, (17) Am3mc[ j]5 h[ j]for 1# j # n, (18)

where hkil_{5 (h}

i,1, hi,2, ..., hi,n)T, bkil5 (bi,1, bi,2, ..., bi,n)T

for 1# i # m, c[ j]_{5 (c1, j}_{, c2, j}_{, ..., c}

m, j)T, h[ j]5 (h1, j, h2, j, ..., hm, j)T

for 1# j # n.

Therefore, B-spline surface interpolation becomes a two-part process, namely, solving the m special tridiagonal systems in (17) for hkil_{, 1}_{# i # m, first and then solving}

the n tridiagonal systems in (18) for c[ j ]_{, 1} _{# j # n. We} now present the parallel algorithm for B-spline surface fitting on a mesh-connected computer. The mesh-con-nected computer with p13 p2processors is shown in Fig. 2, where all the rows and columns are linear array proces-sors. For simplicity, suppose m5 p1q1and n5 p2q2.

The linear array processor in row j, 0 # j # p12 1, is responsible for solving A_n3nhkkl_{5 36b}kkl_{for jq}

11 1 # k # ( j 1 1)q1 by applying Algorithm 1 q1 times. Since each system A_n3nhkil5 bkilcan be solved in O(n/p2) time, totally it takes O(mn/( p1p2)) time for solving (17) and it gives iAn3nhkil2 bkili # 0.173ubtu ibkili. Therefore, we have

iCh 2 bi # 0.173ubt_{u max}

1#i#mib kil_{i 5 0.173ub}t_{u ibi.} where Ci, j5 (c (1) i, j , c (2) i, j, c (3) i, j), 0# i # m 1 1, 0 # j # n 1

1, are the control points of the surface to be determined. Our task is to compute these control points.

Each data point can be expressed by a weighted average of nine control points: Bi, j 5 (1/36)(Ci21, j21 1 4Ci, j21 1

Ci11, j21 1 4Ci21, j1 16Ci, j 1 4Ci11, j 1 Ci21, j111 4Ci, j11 1

Ci11,j11) for 1# i # m, 1 # j # n. They form a system of mn equations in (m 1 2)(n 1 2) unknowns. In order to

completely solve the system, commonly we need the fol-lowing 2(m1 n 1 2) 5 (m 1 2)(n 1 2) 2 mn additional equations to specify how the boundary control points are interpolated: C0, j5 C1, j; Cm11, j 5 Cm, j, 1# j # n; Ci, 0 5 Ci, 1;Ci,n115 Ci,n, 0 # i # m 1 1. For simplicity, we only

consider Bi, j5 b

(1)

i, j, 1# i # m and 1 # j # n. Throughout

the remainder of this section, matrices are also represented by uppercase letters, vectors by bold lowercase letters, and scalars by plain lowercase letters. The superscript T corresponds to the transpose operation. That is, we con-sider the given set of data points

bT_{5 36(b}(1) 1,1, b (1) 1,2, ..., b (1) 1,n, b (1) 2,1, b (1) 2,2, ..., b (1) 2,n, ..., b(1)_m,1, b(1)_m,2, ..., b(1)_m,n) 5 (b1,1, b1,2, ..., b1,n, b2,1, b2,2, ..., b2,n, ..., bm,1, bm,2, ..., bm,n)

and the corresponding control points (to be determined)

cT_{5 (c}(1) 1,1, c (1) 1,2, ..., c (1) 1,n, c (1) 2,1, c (1) 2,2, ..., c (1) 2,n, ..., c(1)_m,1, c(1)_m,2, ..., c(1)_m,n) 5 (c1,1, c1,2, ..., c1,n, c2,1, c2,2, ..., c2,n, ..., cm,1, cm,2, ..., cm,n).

Employing those additional boundary conditions, our task becomes to solve Bc5 b, where B is a block tridiagonal matrix of the following form:

Bmn3mn5

1

5A

n

A

n

A

n

4A

n

A

n

.

A

n

4A

n

A

n

A

n

5A

n

2

m3m ,

(5)

In contrast, the linear array processor in column j, 0#

j # p2 2 1, is responsible for solving Am3mc[k] 5 h[k] for jq21 1 # k # (j 1 1)q2by applying Algorithm 1 q2times. Each system Am3mc[i]5 h[i]can be solved in O(m/p1) time,

so it takes O(mn/( p1p2)) time in total to solve (18). We haveiAm3mc[ j]2 h[ j ]i # 0.173ubtu ih[ j ]i. Then we have

iDc 2 hi # 0.173ubt_{u max}

1#j#nih

[ j ]_{i 5 0.173ub}t_{i ihi.}

Because ofiCh 2 bi # 0.173ubt_{u ibi, iDc 2 hi # 0.173ub}t_u

ihi, and iCi # 6, it follows that iChi # iCh 2 bz 1 ibi

# (1 1 0.173ubt_u)ibi,

ihi # AsiChi #11 0.173ubtu

2 ibi,

iBc 2 bi 5 iCDc 2 bi

# iCi iDc 2 hi 1 iCh 2 bi # 6 3 0.173ubt_{u ihi 1 0.173ub}t_{u ibi}

# 0.173ubt_{u[3(1 1 0.173ub}t_{u) 1 1]ibi.}

By c5 B21_{b and Corollary 2, we have}

ic 2 ci # As iDc 2 Dci # AfiCDc 2 CDci 5 AfiBc 2 bi and

ibi 5 iBci # iBiyici 5 36ici.

Combining the above two inequalities, we have ic 2 ci # AfiBc 2 bi

#0.173 4 ub

t_{u[3(1 1 0.173ub}t_{u) 1 1]ibi}

# 0.173 3 9ubt_{u[3(1 1 0.173ub}t_{u) 1 1]ici,}

that is, ic 2 ci/ici # 1.557ubt_{u[3(1 1 0.173ub}t_{u) 1 1].}

To save space, we omit the parallel pseudo code for B-spline surface fitting. Finally, we have the main result.

THEOREM5. Given m3 n data points, the uniform

B-spline surface fitting problem can be solved in O(mn/( p1p2))

time on the mesh-connected computer with O( p1p2)

proces-sors; the relative error is# 1.557u(2 2 Ï3)t_{u[3(1 1 0.173u}

(22 Ï3)t_{u) 1 1] for t , min(m/p1}_{, n/p}

2).

When setting p15 m and p25 n, a constant-time parallel solver for B-spline surface fitting can be obtained.

4. CONCLUSIONS

The significance of B-spline surface fitting is due to its popular use in the areas of computer graphics, CAD, CAM, image processing, etc. Our main contribution is to present a novel approximate parallel algorithm for solving B-spline interpolation problem and to show that on the mesh-con-nected computer with O( p1p2) processors, our algorithm can be performed in O(mn/( p1p2)) time. The relative error analyses have also been given.

Using the same matrix perturbation method proposed in this paper and the Sherman–Morrison formula [3], the product-expansion based vectorized algorithms for solving B-spline curve and surface fitting were presented in [8, 9, 10]. The extension to solve the special tridiagonal systems has been developed in [23, 14]. Further, our parallel algo-rithm can be applied to solve the closed B-spline surface fitting problem on a torus (wraparound mesh) multiproces-sor directly. In addition, the results of this paper can also be applied to solve the diagonally dominant block tridiagonal system to achieve better performance.

It is interesting to employ the other parallel tridiagonal solvers [21, 22] to handle the same surface fitting problem.

ACKNOWLEDGMENTS

The authors appreciate the help of the three referees, Dr. F. Cheng, Dr. J. Gustafson, and Dr. S. Sahni which led to an improved version of this paper. This research was supported in part by the National Science Council of the Republic of China under Contract NSC85-2121-M001-002.

REFERENCES

1. Ajjanagadde, V. G., and Patnaik, L. M., Systolic architecture for B-spline surfaces. Int. J. Parallel Programming 15, 6 (1986), 551–565. 2. Bartels, R. H., Beatty, J. C., and Barsky, B. A., ‘‘An Introduction to

Splines for Use in Computer Graphics and Geometric Modeling.’’

Morgan Kaufmann, San Mateo, CA, 1987.

3. Bartlett, M. S. An inverse matrix adjustment arising in discriminant analysis. Ann. Math. Statist. 22 (1951), 107–111.

4. Cheng, F., and Goshtasby, A. B-spline surface interpolation using SLOR method with parallel relaxation. Technical Report 96-87, De-partment of Computer Science, University of Kentucky, Lexing-ton, 1987.

5. Cheng, F., and Goshtasby, A. A parallel B-spline surface fitting algo-rithm. ACM Trans. Graphics 8, 1 (1989), 41–50.

6. Cheng, F., Wasilkowski, G. W., Wang, J., Zhang, C., and Wang, W. Parallel B-spline surface interpolation on a mesh-connected processor array. J. Parallel Distrib. Comput. 24, 2 (1995), 224–229.

(6)

17. Goshtasby, A., Cheng, F., and Barsky, B. A. B-spline curves and surfaces viewed as digital filters. Comput. Vision Graphics Image

Process. 52, 2 (1990), 264–275.

18. Hockney, R. W. A fast direct solution of Poisson’s equation using Fourier analysis. J. Assoc. Comput. Mach. 12 (1965), 95–113. 19. Leighton, F. T. Introduction to Parallel Algorithms and Architectures:

Arrays, Trees, Hypercubes. Morgan Kaufmann, San Mateo, CA, 1992.

20. Lin, F. C., and Chung, K. L. Cost-optimal B-spline surface fitting on hypercube. Inter. Conf. on Parallel Process. 1990, pp. III–185–192. 21. Sun, X. H., Zhang, H., and Ni, L. Efficient tridiagonal solvers on

multicomputers. IEEE Trans. Comput. 41, 3 (1992), 286–296. 22. Wang, H. H. A parallel method for tridiagonal equations. ACM

Trans. Math. Software 7, 2 (1981), 170–183.

23. Yan, W. M., and Chung, K. L. A fast algorithm for solving special tridiagonal systems. Computing 52 (1994), 203–211.

KUO-LIANG CHUNG received the B.S., M.S., and Ph.D. in computer science and information engineering from the National Taiwan University of the Republic of China. He is now a professor in the Department of Information Management of the National Taiwan Institute of Technology. His current research interests include machine vision, computer graphics, data compression, parallel and distributed computing, and matrix compu-tations. He is a member of ACM, IEEE, and SIAM.

WEN-MING YAN received the B.S. and M.S. in mathematics from the National Taiwan University of the Republic of China. He is now a lecturer in the Department of Computer Science and Information Engi-neering of the National Taiwan University. His current research interests include parallel computing and matrix computations.

7. Chung, K. L., and Lin, F. C. A cost-optimal parallel algorithm for B-spline surface fitting. CVGIP: Graph. Models Image Process. 53, 6 (1991), 601–605.

8. Chung, K. L., and Shen, L. Z. Vectorized algorithm for B-spline curve fitting on CRAY X-MP EA/16se. Proc. of Supercomputing. IEEE Computer Society, Silver Spring, MD, 1992, pp. 166–169. 9. Chung, K. L., Peng, Y. C., and Yan, W. M. Vectorization of

B-spline surface fitting. 1st Workshop on Computer Graphics. Academia Sinica, Republic of China, 1993, pp. 21–26.

10. Chung, K. L., and Yan, W. M. Computing quadratic B-spline curve fitting on CRAY X-MP. Proc. National Computer Symposium, Chi-ayi, Republic of China, 1993, pp. 401–407.

11. Chung, K. L., and Chang, H. W. Novel pipelining and processor allocation strategy for monoid computations on unshuffle-exchange network. Parallel Process. Lett. 3, 2 (1993), 189–193.

12. Chung, K. L. On parallel algorithm for B-spline surface fitting. Tech. Report, Dept. of Inform. Mgmt., National Taiwan Inst. of Tech., Apr., 1994.

13. Chung, K. L., and Yan, W. M. Parallel B-spline surface fitting on mesh. Tech. Report, Department of Information Management, National Taiwan Institute of Technology, Dec. 1994.

14. Chung, K. L., Yan, W. M., and Wu, J. G. A parallel algorithm for solving special tridiagonal systems on ring networks. Computing, in press.

15. de Boor, C. A Practical Guide to Splines. Springer-Verlag, New York, 1978.

16. Golub, G. H., and Van Loan, C. F. Matrix Computations. 2nd ed., Chap. 4. The Johns Hopkins University Press, Baltimore, 1989. Received January 13, 1995; revised March 22, 1996; accepted March