• 沒有找到結果。

分散式稀疏矩陣QR分解之研究

N/A
N/A
Protected

Academic year: 2021

Share "分散式稀疏矩陣QR分解之研究"

Copied!
4
0
0

加載中.... (立即查看全文)

全文

(1)

1

行政院國家科學委員會專題研究計畫成果報告

分散式稀疏矩陣 QR 分解之研究

The Study of Reor der ing Pr oblem for Spar se QR Factor ization

計畫編號:NSC 87-2213-E-002-005

執行期限:86 年 08 月 01 日至 88 年 07 月 31 日

主持人:陳俊良 國立台灣大學資訊工程學系

[email protected]

一、中文摘要 本研究探討的是稀疏矩陣 QR 多鋒面 分解的問題。我們提出了一種方法來計算 分解各鋒面矩陣時的運算個數。這些數字 可以被視為相對應消去樹節點的加權。當 我們要發展好的稀疏矩陣 QR 分解程式 時,這些資料十分的有用。比如,利用此 一資料,QR 分解的平行執行時間大約可縮 短 10%。 關鍵詞:數值線性代數、稀疏矩陣、QR 分 解、多鋒面方法、最小平方和問題 Abstr act

This research considers the sparse multifrontal QR factorization. An efficient method to evaluate numbers of multiplicative operations in factorizing each frontal matrix is proposed. These numbers can be treated as the node costs of the corresponding elimination tree. This knowledge is very useful to improve performance of sparse QR factorization. For example, experiments conducted so far show that about 10\% of the parallel execution time can be reduced.

Keywor ds: numerical linear algebra, sparse

matrix, QR factorizatiion, multifrontal method, least-squares problem

二、計畫緣由與目的

For a matrix A, there is an orthogonal matrix Q such that

A = Q R

where R is upper triangular. This is the QR factorization of A, and the matrix R will be

called the R-factor of A. QR factorization is a very useful process to solve many numerical linear algebra problems, e.g., the least-squares problem. In real application, the matrix A is usually large and sparse.

Usually, we use elimination tree to represent the process of sparse QR factorization. A node represents a task. The workload of each task is different. For example, Figure 1(d) is an elimination tree with different costs. The information about node costs is very useful. For example, consider Figure 1(d) and assume that there are 2 processors. The best task allocation strategy is to assign tasks 1, 2, 3 and 4 to one processor and to assign task 5 to the other.

The goal of this research is to design an efficient method to evaluate node costs of the elimination tree associated with a sparse multifrontal QR factorization.

三、結果與討論 (1) Backgr ound

Considering the matrix A shown in Figure 1(a), the corresponding ATA, the fill-in graph and the elimination tree are shown in Figures 1(b), 1(c) and 1(d), respectively. The frontal matrices, the associated Householder transformations and the update matrices are drawn in Figure 2. Here, H(m, n) represents a Householder transformation for an m by n dense matrix. It can be proved that an H(m, n) needs (2mn + m + n) multiplicative operations when m > 2 and n > 1, and none operation otherwise. We can treat this number as the cost of an H(m, n), denoted as CH(m, n). According to Figure 2, we can calculate all of the node costs. For

(2)

2

example, the cost of node 1 is: CH(3, 4) + CH(2, 3) = 31 + 17 = 48. These costs are shown in Figure 1(d).

From the above example, we understand that the key-points to evaluate node costs are the dimensions of all small dense matrices which Householder transformations are applied to. One possible way to collect these data is the profile of the numerical factorization, but the time complexity will be the same as that of numerical factorization. The following definitions, lemmas and theorems are required to establish an efficient cost evaluation method.

(2) Cost Evaluation

Consider a matrix A with m rows and n columns, m ≧ n.

Definition 1: For matrix A, let l be the0j

number of rows whose leading nonzeros are in column j.

Let mj and nj be the numbers of rows

and columns of Fj, respectively. And let vp(j,1),

vp(j,2), … , vp(j,nj) be the nodes in {vj} ∪

Madj(vj), where 1 ≦ p(j,1) < p(j,2) < … <

p(j,nj) ≦ n.

Definition 2: For a frontal matrix Fj, let )

, (ji p j

m be the number of rows whose leading nonzeros are in columns p(j,1), p(j,2), … , p(j,i).

Theor em 1: The Cost to factorize frontal

matrix Fj is

= + − + − = nj i j i j p j j CH m i n i CF 1 ) , ( ) 1 , 1 (

Definition 3: For an update matrix Uj, let )

, (ji p j

l be the number of rows whose leading nonzeros are in columns p(j,i), for 2 ≦ i ≦ nj.

Theor em 2: For a node vj,

   ≤ ≤ + = + = . 2 for 1 for ) , ( ) 1 , ( ) 1 , ( 0 ) , ( j i j p j i j p j j p j j i j p j n i h m i h l m

Theor em 3: For a node vj,

). 1 , min( ) , min( ( ,) ( , 1) ) , ( = − − − i m i m ljp ji jp ji pj ji

(3) Evaluation Algor ithm

Combining all theorems, Algorithm 2 can evaluate costs of factorizing each frontal matrix, i.e., costs of each node.

(4) Application

Under multiprocessor environment, processor allocation and task scheduling are important issues to achieve high performance. It seems that the knowledge of node costs is useful for QR factorization. Therefore, we do some experiments.

First, we modify Kan's processor allocation and task scheduling algorithm [2] so that it is suitable for QR factorization. Then, we apply the algorithm to several test matrices that are selected from Harwell-Boeing collection, and estimate the parallel execution time. The process contains two parts. In the first part, the knowledge of node costs is not supported; that is, we assume that the costs of all nodes are equivalent. In the second part, the knowledge of node costs is supported. Figure 3 is our experimental result. In Figure 3, the estimated execution time is normalized. We assume that the normalized execution time is 1, if there are infinite number of processors and the system is communication-free. Experiments conducted so far show that about 10\% of the the execution time can be reduced.

(5) Conclusion

We have proposed an efficient method to evaluate node costs of the elimination tree associated to a sparse multifrontal QR factorization. Knowledge of node costs is very useful for studying many related problems, e.g., column reordering, processor allocation and task scheduling of sparse multifrontal QR factorization.

四、計畫成果自評

(3)

3

分結果已發表於 International Conference on Parallel and Distributed Processing Techni-ques and Applications [1]。

五、參考文獻

[1] D. M. Jiang and C. L.Chen, 1998/07, “Efficient Cost Evaluation for Sparse Multifrontal QR Factorization,” Proceedings of the International

Conference on Parallel and Distributed

Processing Techniques and Applications, Las

Vegas, Nevada, USA, pp. 1567-1574.

[2] T. T. Kan and C. L. Chen, 1998/12, “Processor Allocation and Task Scheduling for Parallel Sparse Cholesky Factorization,” Proceedings of the Second International Conference on Parallel

and Distributed Computing and Networks,

Brisbane, Australia, 200-205.

Figure 1.

(4)

4

參考文獻

相關文件

Let us consider the numbers of sectors read and written over flash memory when n records are inserted: Because BFTL adopts the node translation table to collect index units of a

Real Schur and Hessenberg-triangular forms The doubly shifted QZ algorithm.. Above algorithm is locally

In this chapter we develop the Lanczos method, a technique that is applicable to large sparse, symmetric eigenproblems.. The method involves tridiagonalizing the given

For the proposed algorithm, we establish its convergence properties, and also present a dual application to the SCLP, leading to an exponential multiplier method which is shown

The closing inventory value calculated under the Absorption Costing method is higher than Marginal Costing, as fixed production costs are treated as product and costs will be carried

This option is designed to provide students an understanding of the basic concepts network services and client-server communications, and the knowledge and skills

Establish the start node of the state graph as the root of the search tree and record its heuristic value.. while (the goal node has not

/** Class invariant: A Person always has a date of birth, and if the Person has a date of death, then the date of death is equal to or later than the date of birth. To be