國立交通大學理學院應用數學系
博士論文
某些矩陣的高
-
吳數
Gau-Wu numbers of certain matrices
研究生: 李信儀
Student: Hsin-Yi Lee
指導教授: 吳培元 教授
Advisor: Pei Yuan Wu
中華民國一百零三年六月
某些矩陣的高
-
吳數
Gau-Wu numbers of certain matrices
研 究 生:李信儀
Student: Hsin-Yi Lee
指導教授:吳培元 教授 Advisor: Dr. Pei Yuan Wu
國 立 交 通 大 學
應 用 數 學 系
博 士 論 文
A Thesis
Submitted to Department of Applied Mathematics College of Science
National Chiao Tung University in partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy
in
Applied Mathematics
June 2014
Hsinchu, Taiwan, Republic of China
中華民國一百零三年六月
June, 2014
某些矩陣的高
-
吳數 研究生: 李信儀 指導教授: 吳培元 教授國立交通大學
應用數學系 博士班
摘要
對於一個
乘 的矩陣 ,令 ( ) 表示數值域邊界上的點
⟨
⟩ 所對應的正交單位向量
的最大個數。我們稱這個數
( ) 為高-吳數。若 為正規或二次矩陣,則其高-吳數 ( ) 可
以明確地被計算出來。而對於一個矩陣 形如 ,我們證明了
高-吳數為 2 時,其充分且必要條件為其中一個矩陣,稱之為 ,的
數值域,完全落在另外一個矩陣 的數值域的內部且 ( ) 為 2。
對於一個不可約的矩陣 ,我們可以確切地決定何時其高-吳數等於
n
。這些結果以及已知的 4 乘 4 矩陣的數值域的圖形,可用以決定
任何一個 4 乘 4 可約矩陣的高-吳數。
此外,設
為一個 乘 n (n 大於或等於 2) 的非負矩陣,其形如
下
[
],
此處 m 大於或等於 3 並且對角線上所出現的零均為零方陣。若
的實部為不可約的矩陣,則其高-吳數 ( ) 的上界為 m-1。再者,我
們也得到這種矩陣的高-吳數達到其上界的充分且必要條件。除此之
外,我們也研究了另外一類型的非負矩陣,稱之為雙隨機矩陣。我們
證明了任何一個 3 乘 3 的雙隨機矩陣的高吳數必定為 3。另外,我
們也決定了 4 乘 4 的雙隨機矩陣的數值域及其高-吳數。最後我們
也考慮一般的 乘 n (n 大於或等於 5) 雙隨機矩陣,藉由其可能的
數值域的圖形得到其高-吳數的下界。
Gau-Wu numbers of certain matrices
Student: Hsin-Yi Lee Advisor: Dr. Pei Yuan Wu
Department of Mathematics
National Chiao Tung University
ABSTRACT
For any
- -n , let ( ) stand for the maximal number of
orthonormal vectors
such that the scalar products
⟨
⟩ lie in the
boundary of the numerical range
( ). This number ( ) is called the
Gau-Wu number of the matrix
If is a normal or a quadratic matrix,
then the exact value of
( ) can be computed. For a matrix of the
form
, we show that ( ) if and only if the numerical range
of one summand, say, , is contained in the interior of the numerical
range of the other summand
and ( ) . For an irreducible matrix
, we can determine exactly when the value of ( ) equals the size of
. These are then applied to determine ( ) for a reducible matrix of
size
in terms of the shape of ( )
Moreover, if
is an n- -n (n ) nonnegative matrix of the form
[
],
where m and the diagonal zeros are zero square matrices, with
irreducible real part, then
( ) has an upper bound In addition,
we also obtain necessary and sufficient conditions for
( ) for
such a matrix
The other class of nonnegative matrices we study is the
doubly stochastic ones. We prove that the value of
( ) is equal to
for any
-by- doubly stochastic matrix . Next, for any -by-
doubly stochastic matrix, we also determine its numerical range. This
result can be applied to find the value of
( ) for any doubly stochastic
matrix
of size in terms of the shape of ( ) Furthermore, the
lower bound of
( ) is also found for a general n- -n (n ) doubly
stochastic matrix
via the possible shapes of ( )
誌 謝
本論文得以完成,首先要特別感謝我的指導教授 吳培元老師在這五年多來,無 論是學業上或研究上都不斷地給予各種幫助。除了學術上的教導,我更從他的身 上學習到做人處事的道理以及態度。非常感謝吳老師所教授的一切。另外要感謝 口試委員黃毅青教授、簡茂丁教授、高華隆教授與王國仲教授的指點,使得本論 文更臻完善。 在博士班求學的生涯中,感謝許多交大應用數學系教授的指導與助理們的幫助, 如王國仲教授對於數值域中所探討的高-吳數所給予的寶貴建議與意見,讓我能 看到自己的盲點,進而解決某一類矩陣的問題。其他還有林文偉講座教授、李明 佳教授、王夏聲教授、林琦焜教授、白啓光教授、葉立明教授、許義容教授、張 書銘教授等等,在課業與其他方面給我的協助。另外要感謝中央大學高華隆教授, 當我在處理數值域中的問題時,他所給予的協助與鼓勵,讓我對這領域的看法有 了全新的體驗與認識。 沒有師門的學長們的幫忙,很難度過這漫長的博士生涯。感謝逢甲大學張其棟教 授教導我學會繪圖軟體的使用。更感謝中山大學蔡明誠博士在數不清多少個夜晚 同我視訊。一方面檢視我的證明,另一方面給予不斷地鼓勵。使我體悟了這句名 言『問松林,松林幾經冬?山川何如昔,風雲與古同。』另外要感謝系上的學長、 同學與學弟們,包括高雄師範大學吳恭儉教授、呂明杰博士、黃韋強博士、黃皜 文博士、李忠逵博士、陳德軒、龔柏任、黃俊銘、陳哲楷等等。還有其他透過網 路認識的朋友們(族繁不及備載),非常感謝這一路上有他們的陪伴,使得我的 研究生涯多采多姿。 最後要特別感謝家人,包括我的雙親、愛妻文鳳、姊妹們以及愛女采潔。感謝他 們始終不斷地給我支持以及無私的付出與關懷,讓我能夠專心地在研究工作上持 續邁進。尤其是愛女還年幼,雙親與妻子對女兒地細心照料,讓我不必為生活瑣 碎以及孩子的事情而操心,進而能順利地取得博士學位,謹以此論文獻給你們。Contents
Chinese Abstract
………
i
English Abstract
………
ii
Acknowledgement
………
iii
Contents
………
iv
1
Introduction………
1
2
Gau-Wu numbers of direct sums of matrices…………
5
2.1
Introduction ………
5
2.2
Direct sum………
6
2.3
Applications and discussions………
21
3
Gau-Wu numbers of nonnegative matrices………
30
3.1
Introduction………
30
3.2
Nonnegative block shift matrix………
31
3.3
Doubly stochastic matrix………
37
4
References………
43
1
Introduction
Let A be an n-by-n complex matrix. Its numerical range W (A) is, by definition, the set {hAx, xi: x ∈ Cn, kxk = 1}, where h·, ·i and k · k denote the standard inner
product and its associated norm in Cn, respectively. One of the most important
properties of the numerical range is its convexity. In fact, the study of the numerical range originates from the discovery of this property by Toeplitz [17] and Hausdorff [7]: the former proved that the boundary of the numerical range is always a convex curve, but left open the possibility that it may have interior holes while the latter, using a different approach, showed that this cannot happen. An interesting account on the history of this theorem can be found in [6].
For a matrix A, let A∗ denote its adjoint, Re A its real part (A + A∗)/2 and Im A
its imaginary part (A − A∗)/2i. The set of eigenvalues of A is denoted by σ(A). For
any subset △ of C, △∧ denotes its convex hull, that is, △∧ is the smallest convex set
containing △. We list below several important properties of the numerical range. (1) W (U∗AU) = W (A) for any unitary matrix U.
(2) W (A) is a compact subset of C.
(3) W (aA + bI) = aW (A) + b for any scalars a and b. (4) W (Re A) = Re W (A) and W (Im A) = Im W (A). (5) If A = B ∗ ∗ ∗ , then W (B) ⊆ W (A). (6) σ(A) ⊆ W (A).
(7) If A is normal, then W (A) is equal to σ(A)∧. (8) W (P
n ⊕An) = (∪n
W (An))∧.
For other properties of the numerical range, the reader may consult [8, Chapter 1].
In Chapter 2, we consider the maximum number k = k(A) for which there exist orthonormal vectors x1, ..., xk ∈ Cn with hAxj, xji in the boundary ∂W (A) of W (A)
for all j. Note that k(A) is also the maximum size of a compression of A with all its diagonal entries in ∂W (A). Recall that a k-by-k matrix B is a compression of A if B = V∗AV for some n-by-k matrix V with V∗V = I
k. Here Ik denotes the k-by-k
identity matrix. In particular, if n equals k, then A and B are said to be unitarily
similar, which we denote by A ∼= B. The number k(A) was introduced in [5] and [19] and is called the Gau-Wu number by [2]. It relates properties of the numerical range to the compressions of A. In particular, it was shown in [5, Lemma 4.1 and Theorem 4.4] that 2 ≤ k(A) ≤ n for any n-by-n (n ≥ 2) matrix A, and k(A) = ⌈n/2⌉ for any Sn-matrix A (n ≥ 3). Recall that an n-by-n matrix A is of class Sn if it
is a contraction, that is, k A k≡ maxkxk=1kAxk ≤ 1, its eigenvalues are all in the
open unit disc D ≡ {z ∈ C : |z| < 1}, and the rank of In − A∗A equals one. In
[19, Theorem 3.1], it was proven that, for an n-by-n (n ≥ 2) weighted shift matrix A with weights w1, ..., wn, k(A) = n if and only if either |w1| = · · · = |wn| or n is even
and |w1| = |w3| = · · · = |wn−1| and |w2| = |w4| = · · · = |wn|. Recall that an n-by-n
(n ≥ 2) matrix of the form 0 w1 0 . .. . .. wn−1 wn 0
is called a weighted shift matrix with weights w1, ..., wn. Moreover, in [2] k(A) is
computed for two classes of n-by-n matrices as follows. An n-by-n matrix A is almost
normal if it has n − 1 orthogonal eigenvectors. Note that every almost normal matrix
is unitarily similar to An ⊕ Aa, where An is normal while Aa is almost normal and unitarily irreducible (cf. [14]). Recall that a matrix A is unitarily reducible if and
only if A is unitarily similar to A1⊕ A2 for some lower-dimensional matrices A1 and
A2; otherwise, A is unitarily irreducible. In [2, Theorem 3], it was proven that, for
any almost normal matrix A, k(A) = l1+ l2, where l1 is the number of eigenvalues of
An located on ∂W (A), counting their multiplicities, and l2 =
0 if W (Aa) lies in the interior of W (An),
2 if there exist distinct parallel supporting lines of W (A) passing through points of W (Aa), or
1 otherwise.
Furthermore, [2, Theorem 5] shows that if A is an n-by-n (n ≥ 3) tridiagonal Toeplitz
matrix of the form
a c 0 . . . 0 b a c . .. ... 0 . .. ... ... 0 ... ... b a c 0 . . . 0 b a , then k(A) = n if |b| = |c|, ⌈n/2⌉ otherwise.
We will show that if A is a normal or a quadratic matrix, then the exact value of k(A) can be computed. Recall that a quadratic matrix A is one which satisfies A2 + z
1A + z2I = 0 for some scalars z1 and z2. For a matrix A of the form B ⊕ C,
we show that k(A) = 2 if and only if the numerical range of one summand, say, B is contained in the interior of the numerical range of the other summand C and k(C) = 2. For an irreducible matrix A, we can determine exactly when the value of k(A) equals the size of A. These are then applied to determine k(A) for a reducible matrix A of size 4 in terms of the shape of W (A). These results also appeared in [10].
In Chapter 3, we continue to study k(A) for two classes of n-by-n nonnegative matrices A. Recall that an n-by-n matrix A = [aij]ni,j=1 is a nonnegative matrix,
denoted by A 0, if aij ≥ 0 for all i and j. Recall also that a square matrix P
and all other entries are 0. Note that any permutation matrix P is unitary with P∗ = PT = P−1. Two square matrices A and B of the same size are permutationally similar if there is a permutation matrix P such that PTAP = B, which is denoted by
A ∼=p B. A matrix A is permutationally reducible if it is permutationally similar to a
matrix of the form
B C 0 D
, where B and D are square matrices; otherwise, A is
permutationally irreducible. This should not be confused with the notion of unitarily
reducible (resp., irreducible) matrix. For nonnegative matrices, reducibility (resp., irreducibility) in general refers to the permutational one. Note that the reducibility (or irreducibility, for that matter) of nonnegative matrices is preserved under the permutational similarity, and the irreducibility of a nonnegative matrix A passes to
that of Re A. The converse of the latter is false as witness A = 0 1 0 0 . If A is an n-by-n (n ≥ 2) nonnegative matrix of the form
0 A1 0 0 . .. . .. Am−1 0 0 ,
where m ≥ 3 and the diagonal zeros are zero square matrices, with irreducible real part, then k(A) has an upper bound m − 1. In addition, we also obtain necessary and sufficient conditions for k(A) = m − 1 for such a matrix A. The other class of nonnegative matrices we study is the doubly stochastic ones. Recall that an n-by-n nonnegative matrices A is doubly stochastic if its row sums and column sums are all equal to one. It is proven that the value of k(A) can be determined for any doubly stochastic matrix A of size 3 or 4 in terms of the shape of W (A). Note that the shapes of W (A) can be determined completely by the tests given in [1, Theorems 1 and 3]. Moreover, the lower bound of k(A), in general, is also found for an n-by-n (n ≥ 5) doubly stochastic matrix via possible shapes of W (A).
2
Gau-Wu numbers of direct sums of matrices
2.1
Introduction
In Section 2.2 below, we first determine the value of k(A) for a normal matrix A (Proposition 2.2.1). Then we consider the direct sum A = B ⊕ C, where the numerical ranges W (B) and W (C) are assumed to be disjoint. In this case, we show that the value of k(A) is equal to the sum of k1(B) and k1(C) (Theorem 2.2.2), where
k1(B) and k1(C) are defined as follows. We define k1(B) to be the maximum number
k for which there are orthonormal vectors x1, . . . , xk in Cn such that hBxi, xii is in
∂W (A) ∩ ∂W (B) for all i = 1, . . . , k, and similarly for k1(C). Based on the proof of
Theorem 2.2.2, we obtain the same formula for k(A) under a slightly weaker condition on B and C (Theorem 2.2.6). In Section 2.3, we give some applications of Theorem 2.2.6. The first one (Proposition 2.3.1) shows that the equality k(A) = k1(B) + k1(C)
holds for a matrix A of the form B ⊕ C with normal C. In particular, we are able to determine the value of k(A) for any 4-by-4 reducible matrix A (Corollary 2.3.4 and Propositions 2.3.7 − 2.3.9). Moreover, the number k(A ⊕ (A + aIn)) can be
determined for any n-by-n matrix A and nonzero complex number a (Proposition 2.3.10). At the end of Section 2.3, we propose several open questions on k(B ⊕ C) and give a partial answer for one of them (Proposition 2.3.11). That is, the equality k(⊕m
j=1A) = m · k(A) holds if the dimension of Hξ(A) equals one for each ξ ∈ ∂W (A),
where the subspace Hξ(A) is defined in the first paragraph of Section 2.2. By using
this, we can determine the value of k(A) for a quadratic matrix A (Corollary 2.3.12).
Note that all of the results in Sections 2.2 and 2.3 have also appeared in [10].
positive definite, denoted by A > 0, if A is Hermitian and hAx, xi > 0 for all x 6= 0. In
is the n-by-n identity matrix. The n-by-n diagonal matrix with diagonals ξ1, ..., ξn is
denoted by diag (ξ1, ..., ξn). The cardinal number of a set S is #(S). The notation δij
is the Kronecker delta, that is, δij has the value 1 if i = j, and the value 0 if otherwise.
The span of a nonempty subset S of a vector space V , denoted by span (S), is the subspace consisting of all linear combinations of the vectors in S.
2.2
Direct sum
We start by reviewing a few basic facts concerning the boundary points of a numerical range. For an n-by-n matrix A, a point ξ in ∂W (A) and a supporting line L of W (A) which passes through ξ, there is a θ in [0, 2π) such that the ray from the origin which forms angle θ from the positive x-axis is perpendicular to L. In this case, Re (e−iθξ) is
the maximum eigenvalue of Re (e−iθA) with the corresponding eigenspace E
ξ,L(A) ≡
ker Re (e−iθ(A − ξI
n)). Let Kξ(A) denote the set {x ∈ Cn : hAx, xi = ξkxk2} and
Hξ(A) the subspace spanned by Kξ(A). If the matrix A is clear from the context, we
will abbreviate these to Eξ,L, Kξ and Hξ, respectively. For other related properties,
we refer the reader to [4, Theorem 1] and [19, Proposition 2.2]. The next proposition on the value of k(A) for a normal matrix A is an easy consequence of [19, Lemma 2.9]. It can be regarded as a motivation for our study of this topic.
Proposition 2.2.1. If A is an n-by-n normal matrix with p eigenvalues (counting multiplicity) in ∂W (A), then k(A) = p.
Proof. We may assume, after a unitary similarity, that A is a matrix of the form
B ⊕ C, where B = diag (λ1, . . . , λp) and C = diag (λp+1, . . . , λn) with λ1, . . . , λp ∈
∂W (A) and λp+1, . . . , λn ∈ int W (B). It follows from [19, Lemma 2.9] that k(A) =
k(B ⊕ C) = k(B) = p. One of our main results of this section is the following theorem for k(A) when A is a matrix of the form B ⊕C with disjoint W (B) and W (C). Recall that the value of k1(B) is the maximum number k for which there are orthonormal vectors x1, . . . , xk
in Cn such that hBx
i, xii is in ∂W (A) ∩ ∂W (B) for all i = 1, . . . , k. If the subset
∂W (A) ∩∂W (B) is empty, then we define k1(B) = 0. The following theorem provides
a formula for determining the value of k(A) by k1(B) and k1(C).
Theorem 2.2.2. Let A = B ⊕ C, where B and C are n-by-n and m-by-m
matrices, respectively. If the numerical ranges W (B) and W (C) are disjoint, then
k(A) = k1(B) + k1(C) ≤ k(B) + k(C). In this case, k(A) = k(B) + k(C) if and only if k1(B) = k(B) and k1(C) = k(C). In particular, k(A) = m + n if and only if
k1(B) = k(B) = n and k1(C) = k(C) = m.
This will be proven after the following lemma which is the case when C equals a 1-by-1 matrix [c].
Recall that z is an extreme point of the convex subset ∆ of C if z belongs to ∆ and cannot be expressed as a convex combination of two other (distinct) points of ∆; otherwise, z is a nonextreme point. Recall also that a point z is a corner of a convex set ∆ of the complex plane if z is in the closure of ∆ and ∆ has two supporting lines passing through z. If A is a finite matrix, ξ = hAx, xi and kxk = 1, then x is called a unit vector corresponding to the point ξ in W (A).
Lemma 2.2.3. If A = B ⊕ [c] is an n-by-n matrix, where B is of size n − 1 and c is a scalar, then k(A) = k1(B) + k1([c]).
Proof. By Proposition 2.2.1, we may assume that the interior of the numerical
Lemma 2.9]. Obviously, k(B) = k1(B) and k1([c]) = 0 in this case. Hence it remains
to consider the case when c is outside the interior of W (B). That is, we will prove that k(A) = k1(B) + 1 for c /∈ int W (B). By the definition of k(A), there are points
ξj = hAzj, zji in ∂W (A), j = 1, 2, . . . , k(A), with hzi, zji = δij for i, j = 1, ..., k(A).
Clearly, the inequality k(A) ≥ k1(B) + 1 holds. Assume that k(A) ≥ k1(B) + 2. Let
zj = xj⊕ yj for each j. We claim that every xj is a nonzero vector. Indeed, if xj0 = 0
for some j0, then yj0 6= 0 and hzj, zj0i = hyj, yj0i = 0 for all j 6= j0. This implies that
yj = 0 for all j 6= j0 and thus k1(B) is at least k1(B) + 1, which is absurd. Hence
the claim has been proven. From ξj = hAzj, zji = kxjk2bj + kyjk2c ∈ ∂W (A), where
bj = hB (xj/ kxjk) , xj/ kxjki, it follows that ξj is in the (possibly degenerate) line
segment [c, bj], and bj is in the boundary of W (B) for each j. We note that there are
at least two nonzero yj’s; this is because if otherwise, then we obtain the inequality
k1(B) ≥ k1(B)+1, which is a contradiction. Hence we may assume that y1, ..., yh 6= 0,
where h ≥ 2, and that this h is the maximal such number.
If c is not in W (B), then there are exactly two points p and q in the boundary of W (B) such that the two line segments [c, p] and [c, q] are in the boundary of W (A) and the relative interior of these two line segments are disjoint from the boundary of W (B) by the fact that W (A) is the convex hull of the union of W (B) and the singleton {c}. Hence there are three cases to consider: the intersection of the boundary of W (B) and the supporting line at p (resp., q) containing [c, p] (resp., [c, q]) is (1) {p} (resp., {q}), (2) a line segment [p, p′] (resp., {q}) or {p} (resp., a line segment [q, q′] ), or (3)
a line segment [p, p′] (resp., a line segment [q, q′]) (cf. Figure 2.2.4). We need only
prove case (2) since other cases can be done similarly.
p p p c c c q q q W (B) W (B) W (B) p′ p′ q′ (1) (2) (3) 8
Figure 2.2.4
Define three (disjoint) subsets consisting of the corresponding unit vectors, and their cardinal numbers, respectively, in the following:
R ≡ {zj : ξj ∈ [c, p′)} with r ≡ # (R) ,
S ≡ {zj : ξj ∈ (c, q)} with s ≡ # (S) , and
T ≡ {zj : ξj ∈ ∂W (A)\([c, p′) ∪ (c, q))} with t ≡ # (T ) .
So, k(A) = r + s + t. Obviously, every zj ∈ T is of the form xj ⊕ 0. Moreover, we
partition R into two disjoint subsets R1 ≡ {zj : yj 6= 0} and R2 ≡ {zj : yj = 0}. We
call their cardinal numbers r1 and r2, respectively. Without loss of generality, we
may assume that R1 = {z1, ..., zr1}, R2 = {zr1+1, ..., zr1+r2}, S = {zr+1, ..., zr+s}, and
T = {zr+s+1, ..., zr+s+t}, where r1+ r2 = r. This shows that r1+ s = h ≥ 2.
First assume that s = 0. Then r1 ≥ 2. For the clarity of the proof, the following
method is called (∗). Since every yj, j = 1, . . . , r1, is nonzero, we define the vectors
z′
j = (xj/yj) ⊕ 1 for these j’s so that the vectors in M ≡
z′ 1− z′j / z1′ − zj′ r1 j=2= (((x1/y1) − (xj/yj)) ⊕ 0) / z′1− z′j r1
j=2 are linearly independent and are
perpen-dicular to vectors in T ∪R2. This together with [4, Theorem 1] shows that span (M) ⊆
∪η∈[c,p′]Kη(A) and thus every unit vector in span (M) is a unit vector corresponding
to some η ∈ ∂W (B). Choosing an orthonormal basis {vj⊕ 0}rj=21 for the subspace
span (M), we deduce from the orthonormality of the vectors in T ∪ R2∪ {vj⊕ 0}rj=21
that
which is impossible. Hence we must have s ≥ 1.
If s = 1, then r1 ≥ 1. A similar argument as above yields that
k1(B) ≥ t + r2+ 1 if r1 = 1, and t + r2+ (r1− 1) + 1 if r1 ≥ 2
by considering the orthonormal subsets T ∪ R2∪ {(xr+1/ kxr+1k) ⊕ 0} and T ∪ R2∪
{vj ⊕ 0}rj=21 ∪ {(xr+1/ kxr+1k) ⊕ 0}, where {vj ⊕ 0}rj=21 is an orthonormal subset of
span (R1), via applying (∗) on R1. The above inequalities imply that
k1(B) ≥ r + s + t − 1 ≥ k(A) − 1 ≥ k1(B) + 1 if r1 = 1, and r + s + t − 1 ≥ k(A) − 1 ≥ k1(B) + 1 if r1 ≥ 2.
This is a contradiction. Hence s ≥ 2.
If r1 = 0, then applying (∗) on S, we reach a contradiction since
k1(B) ≥ t + r2+ (s − 1) = r + s + t − 1 = k(A) − 1 ≥ k1(B) + 1.
If r1 = 1, then we obviously have the linear independence of the subset N ≡
z′ 1− zj′ / z′1− zj′ r+s j=r+2 = (((x1/y1) − (xj/yj)) ⊕ 0) / z1′ − zj′ r+s j=r+2 by
ap-plying (∗) on S again. Let {vj⊕ 0}r+sj=r+2 be an orthonormal basis for the subspace
span (N). Hence
k1(B) ≥ t + r2+ (s − 1) + 1 = r + s + t − 1 = k(A) − 1 ≥ k1(B) + 1
by the orthonormality of the vectors in T ∪ R2 ∪ {vj ⊕ 0}r+sj=r+2 ∪ {(x1/ kx1k) ⊕ 0}.
This is again a contradiction. If r1 ≥ 2, then applying Method I on S and R1,
we have the linear independence of the subsets P ≡ z′ 1− zj′ / z1′ − zj′ r+s j=r+2 = (((x1/y1) − (xj/yj)) ⊕ 0) / z′1− z′j r+s j=r+2 and Q ≡ z′ 1− zj′ / z1′ − zj′ r1 j=2 = (((x1/y1) − (xj/yj)) ⊕ 0) / z′1− z′j r1 j=2, respectively. Let {vj⊕ 0} r+s j=r+2 be an
or-thonormal basis for span (P ). Then span (P ) ⊕ span (x ⊕ y) = span (S) for some unit vector x ⊕ y orthogonal to span (P ). Clearly, x is a nonzero vector; this is
because if otherwise, then 0 ⊕ y(∈ span (S)) is orthogonal to z1 = x1 ⊕ y1(∈ R1),
which contradicts the fact that y and y1 are nonzero scalars. Let {vj⊕ 0}rj=21 be
an orthonormal basis for the subspace span (Q). Then we conclude that the subset T ∪ R2∪ {vj⊕ 0}rj=21 ∪ {vj ⊕ 0}r+sj=r+2∪ {(x/ kxk) ⊕ 0} is orthonormal so that
k1(B) ≥ t + r2+ (r1− 1) + (s − 1) + 1 = r + s + t − 1 = k(A) − 1 ≥ k1(B) + 1,
which is a contradiction. This completes the proof of case (2).
In case (1), we define three subsets consisting of the corresponding unit vectors, and their cardinal numbers, respectively, as follows:
R ≡ {zj : ξj ∈ [c, p)} with r ≡ # (R) ,
S ≡ {zj : ξj ∈ (c, q)} with s ≡ # (S) , and
T ≡ {zj : ξj ∈ ∂W (A)\([c, p) ∪ (c, q))} with t ≡ # (T ) .
As for case (3), we have
R ≡ {zj : ξj ∈ [c, p′)} with r ≡ # (R) ,
S ≡ {zj : ξj ∈ (c, q′)} with s ≡ # (S) , and
T ≡ {zj : ξj ∈ ∂W (A)\([c, p′) ∪ (c, q′))} with t ≡ # (T ) .
As before, we partition R (resp., S) into two disjoint subsets R1 ≡ {zj : yj 6= 0}
and R2 ≡ {zj : yj = 0} (resp., S1 ≡ {zj : yj 6= 0} and S2 ≡ {zj : yj = 0}). Based
on the arguments for case (2), we get a series of contradictions for each individual case. In a similar fashion, we remark that if A = B ⊕ cIm, where c /∈ W (B), then
k(A) = k1(B) + k1(cIm) = k1(B) + m. This remark will be used in the remaining
part of the proof.
To complete the proof, we let c be in the boundary of W (B). Assume that ∂W (B) contains no line segment. We infer that c = bj = ξj for j = 1, ..., h since these
of W (B). Define a new vector z′
j = (xj/yj) ⊕ 1 for each j = 1, ..., h. Then the
subset S ≡ z′ 1− zj′ / z′1− z′j h j=2 = (((x1/y1) − (xj/yj)) ⊕ 0) / z1′ − zj′ h j=2
is linearly independent. Since c is an extreme point of W (A), we have Hc(A) =
Kc(A) by [4, Theorem 1] and span (S) is a subspace of Hc(A). Let {vj⊕ 0}hj=2 be
an orthonormal basis for span (S). Then c = hA (vj ⊕ 0) , vj⊕ 0i = hBvj, vji is in
∂W (B) for j = 2, . . . , h. Hence
k(B) ≥ (h − 1) + (k(A) − h) = k(A) − 1 ≥ k(B) + 1.
This is a contradiction. So, we may assume that ∂W (B) contains a line segment l such that c belongs to l. If c is not an extreme point of l, then we infer that c = bj = ξj or ξj ∈ (c, bj) for j = 1, ..., h since xj and yj are nonzero vectors for these
j’s. Hence zj ∈ Hc(A) for j = 1, ..., h by [4, Theorem 1]. Similar arguments show
that Hc(A) has an orthonormal subset {wj⊕ 0}hj=2. Since Hc(A) = ∪η∈lKη(A) by [4,
Theorem 1], this implies that wj ⊕ 0 ∈ Kηj(A), where ηj ∈ l, for j = 2, ..., h. From
ηj = hA (wj⊕ 0) , wj⊕ 0i = hBwj, wji ∈ l ⊆ ∂W (B), where j = 2, ..., h, we reach a
contradiction since
k(B) ≥ (h − 1) + (k(A) − h) = k(A) − 1 ≥ k(B) + 1.
For the remaining part of the proof, let c be an extreme point of l, where l is a line segment on the boundary of W (B). We consider two cases: either (a) there is only one line segment in ∂W (B) passing through c, or (b) there are exactly two line segments in ∂W (B) passing through c. In case (a), since xj and yj are nonzero vectors
for j = 1, ..., h, we infer that c = bj = ξj or ξj ∈ (c, bj) for these j’s. This implies that
zj ∈ Hη(A) by [4, Theorem 1], where η is not an extreme point of l. So, the same
arguments as above lead us to a contradiction. For case (b), since c is a corner of W (B), c is a reducing eigenvalue of B by [3, Theorem 1]. Thus B is unitarily similar to a matrix of the form B′⊕cI
n′, where c is not an eigenvalue of B′, and the size of B′
and n′ are both less than n. Obviously, c /∈ W (B′). We apply the preceding remark
as for the case of c /∈ W (B) to see that k(A) = k(B′ ⊕ cI
n′+1) = k1(B′) + n′ + 1,
and k(B) = k(B′ ⊕ cI
n′) = k1(B′) + n′. In addition, k(B) = k1(B) in this case.
Hence we obtain that k(A) = k1(B) + 1, which contradicts our assumption that
k(A) ≥ k1(B) + 2. With this, we conclude the proof of the asserted equality.
We remark that the part of the proof of Lemma 2.2.3 on c /∈ W (B) involves the following three cases (1), (2), and (3) depending on whether ∂W (B) contains a line segment or otherwise. In case (1), we have R = {zj : yj 6= 0} and S = {zj : yj 6= 0},
in (2) R = R1 ∪ R2, where R1 = {zj : yj 6= 0} and R2 = {zj : yj = 0}, and
S = {zj : yj 6= 0}, and in (3) R = R1 ∪ R2, where R1 = {zj : yj 6= 0} and
R2 = {zj : yj = 0}, and S = S1∪ S2, where S1 = {zj : yj 6= 0} and S2 = {zj : yj = 0}.
Note that the key point is to handle R and S in (1), R1 and S in (2), and R1 and
S1 in (3), that is, all nonzero yj’s of the three cases. We find that the proofs of the
three cases are almost the same. This observation can facilitate the proof of Theorem 2.2.2 as follows. If ∂W (B) contains a line segment such that this line segment is a portion of ∂W (A) and stretches to a point of ∂W (C), then we take the same method as the proof of Lemma 2.2.3 on c /∈ W (B) to partition the corresponding R into R1 = {zj : yj 6= 0} and R2 = {zj : yj = 0}. As mentioned above, we need only handle
R1. On the other hand, if ∂W (B) contains no such line segments, then we need only
handle the corresponding R = {zj : yj 6= 0}. From this, there is no difference between
the proofs of the two cases. Hence we may assume, in the proof of Theorem 2.2.2, that ∂W (B) and ∂W (C) contain no line segments.
Before giving a proof of Theorem 2.2.2, we note several things. First of all, by Lemma 2.2.3, we may assume that both of the numerical ranges W (B) and W (C) are not singletons. Secondly, we may further assume that ∂W (B) and ∂W (C) contain no line segment by the above remark. Thirdly, since W (A) is the convex hull of the union of W (B) and W (C), there are two line segments, called [a, p] and [b, q], in ∂W (A), where a, b ∈ ∂W (B) and p, q ∈ ∂W (C). Fourthly, it is easy to check that a 6= b and
p 6= q. Indeed, if a = b, then a is a corner. By [3, Theorem 1], we obtain that a is a reducing eigenvalue of A, and hence a is a reducing eigenvalue of B. This shows that W (B) must contain a line segment, which contradicts our previous assumption. Similarly, we also have p 6= q. Combining the above, we have the following Figure 2.2.5 as the numerical range W (A).
a b W (B) W (C) p q Figure 2.2.5
As before, by the definition of k(A), there exist ξj = hAzj, zji ∈ ∂W (A), j =
1, 2, . . . , k(A), where zj = xj ⊕ yj, and hzi, zji = δij for i, j = 1, ..., k(A). We define
four (disjoint) subsets consisting of the corresponding unit vectors, and their cardinal numbers, respectively, as follows:
R ≡ {zj : ξj ∈ (a, p)} with r ≡ # (R) ,
S ≡ {zj : ξj ∈ (b, q)} with s ≡ # (S) ,
TB ≡ {zj : ξj ∈ ∂W (A) ∩ ∂W (B)} with t1 ≡ # (TB) , and
TC ≡ {zj : ξj ∈ ∂W (A) ∩ ∂W (C)} with t2 ≡ # (TC) .
Since the intersection of W (B) and W (C) is empty, and ∂W (B) and ∂W (C) contain
no line segment, we may assume that
R = {zj = xj ⊕ yj : xj 6= 0 and yj 6= 0}rj=1,
S = {zj = xj ⊕ yj : xj 6= 0 and yj 6= 0}r+sj=r+1,
TB = {zj = xj ⊕ 0 : xj 6= 0}r+s+tj=r+s+11 , and
TC = {zj = 0 ⊕ yj : yj 6= 0}r+s+tj=r+s+t1+t12+1.
So, k(A) = r + s + t1 + t2, k1(B) ≥ t1 and k1(C) ≥ t2. Clearly, the inequality
k(A) ≥ k1(B) + k1(C) holds. Now we are ready to prove Theorem 2.2.2.
Proof of Theorem 2.2.2. We need only prove that the reversed inequality k1(B) +
k1(C) ≥ k(A) holds. First, we consider the case r = 0. Assume that s = 0. Then our
assertion is obvious since
k1(B) + k1(C) ≥ t1+ t2 = r + s + t1+ t2 = k(A).
Assume that s = 1, i.e., z1 = x1⊕ y1 ∈ S. Then k1(B) ≥ t1+ 1 since the unit vector
(x1/ kx1k) ⊕ 0 is clearly orthogonal to TB and hB (x1/ kx1k) , x1/ kx1ki is in ∂W (B)
by the convex combination
hAz1, z1i = kx1k2 B x1 kx1k , x1 kx1k + ky1k2 C y1 ky1k , y1 ky1k ∈ (b, q) . Hence k1(B) + k1(C) ≥ (t1+ 1) + t2 = r + s + t1+ t2 = k(A).
Assume that s = 2, i.e., z1 = x1⊕ y1 and z2 = x2⊕ y2 ∈ S. If x1 and x2 are linearly
independent, then by the Gram-Schmidt process, there are two unit vectors z′ 1 and
z′
2, where zj′ = x′j ⊕ y′j with x′j 6= 0 for j = 1, 2 such that x′1 and x′2 are mutually
orthogonal, and span ({z1, z2}) is equal to span ({z1′, z′2}). Choosing the two unit
vectors (x′
1/ kx′1k) ⊕ 0 and (x′2/ kx′2k) ⊕ 0, we obtain that k1(B) ≥ t1+ 2. Hence
On the other hand, if x1 and x2 are linearly dependent, say, x2 = λx1 for some scalar
λ, then we define a new unit vector
z2′ = z2− λz1
kz2− λz1k = 0 ⊕
y2− λy1
ky2− λy1k ∈ span ({z 1, z2})
so that span ({z1, z2}) = span ({z1′}) ⊕ span ({z′2}) for some unit vector z1′ ≡ x′1⊕ y1′,
where z′
1 and z2′ are mutually orthogonal. Clearly, x′1 6= 0 for otherwise it leads
to x1 = x2 = 0, which contradicts the definition of S. From the two unit vectors
(x′
1/ kx′1k) ⊕ 0 and z2′, we infer that k1(B) ≥ t1+ 1 and k1(C) ≥ t2 + 1. Hence
k1(B) + k1(C) ≥ (t1+ 1) + (t2+ 1) = r + s + t1 + t2 = k(A).
Assume that s ≥ 3, that is, S = {zj = xj ⊕ yj : xj 6= 0 and yj 6= 0}sj=1. We consider
the largest linearly independent subset of {xj}sj=1 as follows. Without loss of
gener-ality, we may assume that this can be {xj}sj=1, {x1} or {xj}lj=1, where 1 < l < s. For
the first two cases, it can be done by applying similar arguments as for the case of s = 2. In the last case, since xj is a linear combination of x1, ..., xl for j = l + 1, ..., s,
it is easy to check that the unit vectors
(1) zj′ ≡ zj− Σ l i=1a (j) i zi zj− Σ l i=1a (j) i zi = 0 ⊕ yj − Σli=1a (j) i yi yj − Σ l i=1a (j) i yi , j = l + 1, ..., s,
are linearly independent. Let y′ j = yj−Σli=1a(j)i yi yj−Σ l i=1a (j) i yi for j = l + 1, ..., s. Since F ≡ span z′ j = 0 ⊕ yj′ s j=l+1
is a subspace of the space V ≡ span {zj}sj=1
, the or-thogonal complement of F in V , called E, can be written as span z′
j ≡ x′j ⊕ y′j
l
j=1
for some unit vectors z′
j, j = 1, ..., l. By (1), we see that {x′j}lj=1 is linearly
indepen-dent since {xj}lj=1 is linearly independent. Hence we may assume that both {x′j}lj=1
andy′ j
s
j=l+1are orthogonal subsets by the Gram-Schmidt process. This shows that
G1 ≡ x′ j/ x′j ⊕ 0 l j=1 and G2 ≡ 0 ⊕ y ′ j s
j=l+1 are orthogonal to TB and TC,
respectively. Since every vector v in G1 (resp., G2) is such that hAv, vi is in ∂W (B)
(resp., ∂W (C)), we obtain that k1(B) + k1(C) ≥ k(A) from k1(B) ≥ t1 + l and
k1(C) ≥ t2+ s − l. This completes the proof of the case r = 0.
Next, we prove for the case r = 1. Obviously, it is sufficient to consider s ≥ 1 since the case r = 1, s = 0 is the same as the case r = 0, s = 1. Assume that s = 1, that is, z1 = x1⊕ x2 ∈ R and z2 = x2⊕ y2 ∈ S. Then k1(B) ≥ t1+ 1 and k1(C) ≥ t2+ 1 since
(x1/kx1k) ⊕ 0 and 0 ⊕ (y2/ky2k) are orthogonal to TB and TC, respectively. Moreover,
hB (x1/ kx1k) , x1/ kx1ki is in the boundary of W (B) by the convex combination
hAz1, z1i = kx1k2 B x1 kx1k , x1 kx1k + ky1k2 C y1 ky1k , y1 ky1k ∈ (a, p) ,
and hC (y2/ ky2k) , y2/ ky2ki is in the boundary of W (C) by the same arguments.
Hence
k1(B) + k1(C) ≥ (t1+ 1) + (t2+ 1) = r + s + t1 + t2 = k(A).
Assume that s = 2. Then we have R = {z1 = x1⊕ y1 : x1 6= 0 and y1 6= 0} and
S = {zj = xj ⊕ yj : xj 6= 0 and yj 6= 0}3j=2. If {x2, x3} is linearly independent, then
we may assume that it is an orthogonal set by the Gram-Schmidt process. By the con-vex combination mentioned above, we infer from the three unit vectors 0 ⊕ (y1/ ky1k),
(x2/ kx2k) ⊕ 0, and (x3/ kx3k) ⊕ 0 that k1(B) ≥ t1 + 2 and k1(C) ≥ t2+ 1. Hence
k1(B) + k1(C) ≥ (t1+ 2) + (t1+ 1) = r + s + t1 + t2 = k(A).
On the other hand, if {x2, x3} is linearly dependent, say, x2 = λx3 for some scalar λ,
then we define a new unit vector
z2′ = z2− λz3
kz2− λz3k = 0 ⊕
y2− λy3
ky2− λy3k ∈ span ({z2
, z3})
so that span ({z2, z3}) = span ({z2′}) ⊕ span ({z′3}) for some unit vector z3′ ≡ x′3⊕ y3′,
where z′
2 is orthogonal to z3′. Clearly, x′3 6= 0 for otherwise it leads to x2 = x3 = 0,
which contradicts the definition of S. From the three unit vectors 0 ⊕ (y1/ ky1k),
0 ⊕ ((y2− λy3) / ky2− λy3k), and (x′3/ kx′3k) ⊕ 0, we infer that k1(B) ≥ t1 + 1 and
k1(C) ≥ t2+ 2. Hence
Assume that s ≥ 3, that is, S = {zj = xj ⊕ yj : xj 6= 0 and yj 6= 0}s+1j=2, and R =
{z1 = x1⊕ y1 : x1 6= 0 and y1 6= 0}. We consider the largest linearly independent
subset of {xj}s+1j=2, which we may assume to be {xj}j=2s+1, {x2} or {xj}lj=2, where
2 < l < s + 1. These three largest subsets are similar to those considered under r = 0, s ≥ 3. Indeed, we need only add the unit vector 0 ⊕ (y1/ ky1k) to every
sub-case of the case r = 0, s ≥ 3. Hence we have proved that the reversed inequality k1(B) + k1(C) ≥ k(A). This completes the proof of the case r = 1.
Let r = 2. With the help of the preceding discussions, we may assume that s ≥ 2. Assume that s = 2, that is, R = {zj = xj ⊕ yj : xj 6= 0 and yj 6= 0}2j=1 and S =
{zj = xj ⊕ yj : xj 6= 0 and yj 6= 0}4j=3. If {x3, x4} is linearly independent, then we
consider two cases as follows. First, we assume that {y1, y2} is linearly independent.
We may further assume that {x3, x4} and {y1, y2} are orthogonal subsets by the
Gram-Schmidt process. Obviously, the two subsets H1 ≡ {0 ⊕ (y1/ ky1k) , 0 ⊕ (y2/ ky2k)}
and H2 ≡ {(x3/ kx3k) ⊕ 0, (x4/ kx4k) ⊕ 0} are orthogonal to TC and TB, respectively.
Since every vector v in H1 (resp., H2) is such that hAv, vi is in the boundary of W (C)
(resp., W (B)), we infer, from k1(B) ≥ t1+2 and k1(C) ≥ t2+2, that k1(B)+k1(C) ≥
k(A). On the other hand, assume that {y1, y2} is linearly dependent, say, y1 = λy2
for some scalar λ. Then we define a new unit vector z′
1 = (z1− λz2)/kz1 − λz2k =
((x1−λx2)/kx1−λx2k)⊕0 so that span ({z1, z2}) = span ({z1′})⊕span ({z2′}) for some
unit vector z′
2 ≡ x′2⊕ y2′, where z1′ and z2′ are mutually orthogonal. Clearly, y2′ 6= 0 for
otherwise it leads to y1 = y2 = 0, which contradicts the definition of R. Moreover,
we may assume that {x3, x4} is an orthogonal subset by the Gram-Schmidt
pro-cess. Hence H3 ≡ {((x1 − λx2) / kx1 − λx2k) ⊕ 0, (x3/ kx3k) ⊕ 0, (x4/ kx4k) ⊕ 0}
and H4 ≡ {0 ⊕ (y′2/ ky2′k)} are orthogonal to TB and TC, respectively. Since every
vector v in H3 (resp., H4) is such that hAv, vi is in the boundary of W (B) (resp.,
W (C)), we infer, from k1(B) ≥ t1+ 3 and k1(C) ≥ t2+ 1, that k1(B) + k1(C) ≥ k(A).
On the other hand, if {x3, x4} is linearly dependent, then we need only consider
the case that {y1, y2} is linearly dependent. So, we may assume that y1 = λy2 and
x3 = µx4 for some scalars λ and µ. Define two new unit vectors z′1 = z1− λz2 kz1− λz2k = x1− λx2 kx1− λx2k ⊕ 0 and z ′ 3 = z3− µz4 kz3− µz4k = 0 ⊕ y3− µy4 ky3− µy4k .
Then span ({z1, z2}) = span ({z1′})⊕span ({z2′}) and span ({z3, z4}) = span ({z′3})⊕
span ({z′
4}) for some unit vectors z2′ = x′2⊕ y2′ and z′4 = x′4⊕ y4′, where z2′ (resp., z4′)
is orthogonal to z′
1 (resp., z′3). Clearly, y2′ and x′4 are nonzero by the same argument
as above. Hence H5 ≡ {((x1− λx2) / kx1− λx2k) ⊕ 0, (x′4/ kx′4k) ⊕ 0} and H6 ≡
{0 ⊕ (y′
2/ ky2′k) , 0 ⊕ ((y3− λy4) / ky3− λy4k)} are orthogonal to TB and TC,
respec-tively. Since every vector v in H5 (resp., H6) is such that hAv, vi is in the boundary
of W (B) (resp., W (C)), we infer, from k1(B) ≥ t1+2 and k1(C) ≥ t2+2, that k1(B)+
k1(C) ≥ k(A). Assume that s ≥ 3, that is, R = {zj = xj ⊕ yj : xj 6= 0 and yj 6= 0}2j=1,
and S = {zj = xj ⊕ yj : xj 6= 0 and yj 6= 0}s+2j=3. If {y1, y2} is linearly independent,
then we may assume that {y1, y2} is orthogonal by the Gram-Schmidt process. In
this case, we consider the largest linearly independent subset of {xj}s+2j=3, which may
be assumed to be {xj}s+2j=3, {x3} or {xj}lj=3 (3 < l < s + 2). Each of the three
cases can be handled by applying similar arguments as for the cases of r = 0, s ≥ 2. On the other hand, if {y1, y2} is linearly dependent, say, y1 = λy2 for some
scalar λ, then we define a new unit vector z′
1 = ((x1− λx2)/kx1− λx2k) ⊕ 0 so that
span ({z1, z2}) = span ({z1′}) ⊕ span ({z′2}) for some unit vector z′2 = x′2⊕ y′2, where z1′
and z′
2 are mutually orthogonal. Clearly, y2′ is nonzero by the same argument as for
the case of r = 0, s = 2. To complete the proof, it remains to consider the three cases mentioned above. By applying similar arguments again as for the cases of r = 0, s ≥ 2, we obtain the reversed inequality k1(B) + k1(C) ≥ k(A). This completes the
proof of the case r = 2.
Finally, assume that r ≥ 3. It suffices to consider s ≥ 3 since s ≤ 2 has been proven if we exchange the roles of s and r. Hence R = {zj = xj ⊕ yj : xj 6= 0 and yj 6= 0}rj=1
and S = {zj = xj⊕ yj : xj 6= 0 and yj 6= 0}r+sj=r+1. As mentioned previously, there are
{xj}r+sj=r+1). Without loss of generality, we may assume that this subset is {yj}rj=1, {y1}
or {yj}lj=11 , where 1 < l1 < r, and {xj}j=r+1r+s , {xr+1} or {xj}r+lj=r+12 , where 1 < l2 < s.
There are a total of nine cases to be considered. Since each case is similar to the one under r = 0, s ≥ 1, it follows that the reversed inequality k1(B) + k1(C) ≥ k(A)
holds. This completes the proof of the case r ≥ 3. At the end of the section, we give a generalization of Theorem 2.2.2 under a slightly weaker condition on B and C. Let A be a matrix of the form B ⊕ C. Since W (A) is the convex hull of the union of W (B) and W (C), we consider two (disjoint) subsets of ∂W (A) as follows: one is ∂W (A) \ (∂W (B) ∪ ∂W (C)) ≡ Γ1, and the other
is ∂W (A) ∩ ∂W (B) ∩ ∂W (C) ≡ Γ2. Geometrically, Γ1 consists of the line segments
contained in ∂W (A) but not in ∂W (B) ∪ ∂W (C). On the other hand, since the common boundaries of the three numerical ranges consist of line segments and points which are not in any line segments, every point of the latter can be regarded as a degenerate line segment. Hence Γ2 consists of the (possibly degenerate) line segments
contained in the common boundaries of the three numerical ranges. If Γ ≡ Γ1∪ Γ2
consists of at most two (possibly degenerate) line segments, then we say that W (A) has property Λ. Evidently, the disjointness of W (B) and W (C) implies that property Λ holds since Γ1 consists of exactly two line segments and Γ2 is empty.
Applying similar arguments as in the proof of Theorem 2.2.2, property Λ is enough to establish the equality k(A) = k1(B)+k1(C). Hence we have the following theorem.
Theorem 2.2.6. Let A = B⊕C, where B and C are n-by-n and m-by-m matrices,
respectively. If W (A) has property Λ, then k(A) = k1(B) + k1(C) ≤ k(B) + k(C). In this case, k(A) = k(B) + k(C) if and only if k1(B) = k(B) and k1(C) = k(C). In particular, k(A) = m + n if and only if k1(B) = k(B) = n and k1(C) = k(C) = m.
2.3
Applications and discussions
The first application of our results in Section 2.2 is a generalization of Lemma 2.2.3. Indeed, we are able to determine the value of k(A) for A = B ⊕ C with normal C.
Proposition 2.3.1. Let A = B ⊕ C, where C is an m-by-m normal matrix.
Then k(A) = k1(B) + k1(C). In this case, k(A) = k(B) + k(C) if and only if
k1(B) = k(B) and k1(C) = k(C). In particular, if C = cIm for some scalar c, then
k(A) = k1(B) + k1(cIm).
Proof. Let the normal C be unitarily similar to ⊕m
j=1[cj]. By [19, Lemma 2.9], we
may assume that all the cj’s lie in ∂W (A). This shows that k1(C) = m immediately.
On the other hand, we also obtain k(A) = k1(B) + m by Lemma 2.2.3. Hence the
asserted equality k(A) = k1(B) + k1(C) has been proven. The remaining assertions
hold trivially by this equality.
An easy corollary of Proposition 2.3.1 is to determine when k (A) equals the size of A for a matrix A = B ⊕ C with normal C.
Corollary 2.3.2. Let A = B ⊕ C, where B is an n-by-n matrix and C is an
m-by-m normal matrix. Then k(A) = n + m if and only if k1(B) = n and k1(C) = m. Assume, moreover, that dim Hη = 1 for all η ∈ ∂W (B). Then k(A) = n + m if and only if k1(B) = n ≤ 2 and k1(C) = m.
Proof. By Proposition 2.3.1, it is clear that k(A) equals the size of A if and only if
k1(B) and k1(C) equal the sizes of B and C, respectively. In this case, the assumption
on Hη implies that k1(B) = n ≤ 2 by [19, Proposition 2.10]. This completes the proof.
the end of Section 2.2, where Γ1 = ∂W (A) \ (∂W (B) ∪ ∂W (C)) and Γ2 = ∂W (A) ∩
∂W (B) ∩ ∂W (C). The next proposition gives a lower bound for k(A).
Proposition 2.3.3. Let A = B ⊕ C be an n-by-n (n ≥ 3) matrix. Then Γ is
empty if and only if the numerical range of one summand is contained in the interior of the numerical range of the other. In particular, if Γ is nonempty, then k(A) ≥ 3.
Proof. If Γ = Γ1 ∪ Γ2 is empty, then both Γ1 and Γ2 are empty. Since Γ1 is
empty, ∂W (A) is contained in ∂W (B) ∪ ∂W (C). This implies that W (B) ∩ W (C) is nonempty, and thus W (B) = W (C), W (B) ⊆ int W (C) or W (C) ⊆ int W (B). Moreover, Γ2 = φ implies that W (B) 6= W (C). With this, we conclude that either
W (B) ⊆ int W (C) or W (C) ⊆ int W (B). The converse is obvious. Hence we have proved the first assertion. Let Γ be nonempty, that is, either Γ1 or Γ2 is nonempty. If
Γ1 is nonempty, then there is a line segment on the boundary of W (A). This shows
that k(A) ≥ 3 by [19, Corollary 2.5]. On the other hand, if Γ2 is nonempty, then
there is a (possibly degenerate) line segment on the common boundaries of the three numerical ranges W (A), W (B) and W (C). Using [19, Corollary 2.5] again, we may assume that the line segment is degenerate, say, to {ξ}. This implies immediately that dimξH(A) ≥ 2. Thus k(A) ≥ 3 by [19, Proposition 2.4].
As an application, when A is reducible, the next corollary gives a necessary and sufficient condition for k(A) = 2.
Corollary 2.3.4. Let A = B ⊕ C be an n-by-n (n ≥ 3) matrix. Then k(A) = 2
if and only if either k(B) = 2 and W (C) ⊆ int W (B), or k(C) = 2 and W (B) ⊆
int W (C).
Proof. If k(A) = 2, then Proposition 2.3.3 shows that Γ is empty, and thus the
numerical range of one summand, say, B is contained in the interior of the numerical range of C. Hence k(C) = 2 by [19, Lemma 2.9]. The converse is obvious by [19,
Lemma 2.9] again. The following proposition determines exactly when k(A) equals the size of A for an irreducible matrix A. It is also stated in [2, Theorem 7] while the proof there is different from ours.
Proposition 2.3.5. Let A be an n-by-n (n ≥ 3) irreducible matrix. Then k(A) = n if and only if ∂W (A) contains a line segment l and there are n points (not necessarily
distinct) in l ∪ (∂W (A) ∩ L), where L is the supporting line parallel to l such that their corresponding unit vectors form an orthonormal basis for Cn.
Proof. We need only prove the necessity. Assume that A is an n-by-n (n ≥ 3)
irreducible matrix with k(A) = n. If ∂W (A) contains no line segment, then dim Hξ=
dim Eξ,l≤ n/2 for all ξ ∈ ∂W (A) by [19, Proposition 2.2]. If n is odd, say, n = 2m+1,
then dim Hξ = dim Eξ,l ≤ m for all ξ ∈ ∂W (A). Since k(A) = n, it follows from [19,
Theorem 2.7] that A is reducible, which is absurd. If n is even, say, n = 2m, then m ≥ 2 by our assumption that n ≥ 3. Since k(A) = n and ∂W (A) contains no line segment, A is unitarily similar to a matrix of the form
ξIm eiθD
−eiθD∗ ηI m
by [19, Theorem 2.7], where dim Hξ = dim Hη = m. Let D = USV be the singular
value decomposition of D, where U and V are unitary and S = diag (s1, ..., sm) is a
diagonal matrix with sj ≥ 0, j = 1, ..., m. Then
U∗ 0 0 V ξIm eiθD
−eiθD∗ ηI m U 0 0 V∗ = ξIm eiθS
−eiθS ηI m
and the latter is unitarily similar to
m M j=1 ξ eiθs j −eiθs j η .
This contradicts the irreducibility of A. Hence ∂W (A) must contain a line segment. We then apply [19, Theorem 2.7] again to complete the proof. An easy corollary of Proposition 2.3.5 is the following upper bound for k(A). This was given in [19, Proposition 2.10]. Here we give a simpler proof.
Corollary 2.3.6. If A is an n-by-n (n ≥ 3) matrix with dim Hξ = 1 for all ξ ∈ ∂W (A), then k(A) ≤ n − 1.
Proof. Assume that k(A) = n. It suffices to consider that A is reducible; this
is because if otherwise, then Proposition 2.3.5 shows that ∂W (A) contains a line segment, which contradicts the assumption on Hξ. Let A = B ⊕ C. Then our
assumption on Hξ implies that Γ is empty. By Proposition 2.3.3, we obtain that the
numerical range of one summand is contained in the interior of the numerical range of the other summand. It follows from [19, Lemma 2.9] that the value of k(A) equals k(B) or k(C). Thus k(A) ≤ n − 1 as asserted. We now combine Proposition 2.3.1, Corollary 2.3.2, Corollary 2.3.4, and Proposi-tion 2.3.5 to determine the value of k(A) for any 4-by-4 reducible matrix A. Corollary 2.3.4 shows exactly when the value of k(A) equals two. By Proposition 2.3.1, Corol-lary 2.3.2 and Proposition 2.3.5, we get a necessary and sufficient condition for the value of k(A) to be equal to four. In other words, the value of k(A) can be determined completely for any 4-by-4 reducible matrix A. To do this, we note that a 4-by-4 re-ducible matrix A can be written, after a unitary similarity, as (i) A = B ⊕ [c], where B is a 3-by-3 irreducible matrix and c is a complex number, (ii) A = B ⊕ [c], where B is a 3-by-3 reducible matrix and c is a complex number, or (iii) A = B ⊕ C, where B and C are 2-by-2 irreducible matrices. Proposition 2.3.7 below is to deal with case (i).
Recall that for a 3-by-3 irreducible matrix A, W (A) is of one of the following
shapes (cf. [9]): an elliptic disc, the convex hull of a heart-shaped region, in which case ∂W (A) contains a line segment, and an oval region.
Proposition 2.3.7. Let A = B ⊕ [c], where B is a 3-by-3 irreducible matrix and c is a complex number. Then k(A) = 4 if and only if c /∈ int W (B) and {a1, a2, b} ⊆
∂W (A), where W (B) is the convex hull of a heart-shaped region, in which case ∂W (B)
contains a line segment [a1, a2] contained in the supporting line L1 of W (B) and L2 is the supporting line of W (B) passing through b and parallel to L1.
Proof. By Corollary 2.3.2, we see that k(A) = 4 is equivalent to k1(B) = 3
and k1([c]) = 1. Since a necessary and sufficient condition for k1([c]) = 1 is that
c /∈ int W (B), it remains to show that k1(B) = 3 if and only if {a1, a2, b} ⊆ ∂W (A)
and W (B) satisfies the asserted properties. If k1(B) = 3, then k(B) = 3. Hence
it follows from Proposition 2.3.5 that ∂W (A) contains {a1, a2, b}, and W (B) is as
asserted. The converse is trivial. For case (ii), let A = B ⊕ [c], where B is a 3-by-3 reducible matrix. After a unitary similarity, B can be written as C ⊕ [b], where C is a 2-by-2 matrix, so that k(A) = k1(C) + k1([b] ⊕ [c]) by Proposition 2.3.1. The following proposition gives a
necessary and sufficient condition for k(A) to be equal to four.
Proposition 2.3.8. Let A = C ⊕ [b] ⊕ [c], where C is a 2-by-2 matrix, and b and c are complex numbers. Then k(A) = 4 if and only if both b and c are in ∂W (A) and k1(C) = 2.
Proof. By Corollary 2.3.2, it is obvious that k (A) = 4 if and only if k1(C) = 2
and k1([b] ⊕ [c]) = 2. Moreover, it is also clear that k1([b] ⊕ [c]) = 2 is equivalent to
both of b and c being in ∂W (A). Hence the proof is complete. To prove for case (iii), let A = B ⊕ C, where B and C are 2-by-2 irreducible
matrices. Since W (A) is the convex hull of the union of the two elliptic discs W (B) and W (C), either W (B) equals W (C), or Γ consists of at most four (possibly degen-erate) line segments. With this, we are now ready to give a necessary and sufficient condition for k(A) = 4.
Proposition 2.3.9. Let A = B ⊕ C, where B and C are 2-by-2 irreducible
matri-ces. Then k(A) = 4 if and only if Γ consists of at least three line segments (including the possibly degenerate ones), or Γ consists of exactly two (possibly degenerate) line segments such that k1(B) = k1(C) = 2.
Proof. If Γ consists of more than four (possibly degenerate) line segments, then
the two elliptic discs W (B) and W (C) are identical. Hence k(A) = 4 by direct computations. If Γ consists of four or three (possibly degenerate) line segments, then the endpoints of the major axes of the two elliptic discs W (B) and W (C) are in ∂W (A). Hence k(A) = 4. If Γ consists of exactly two (possibly degenerate) line segments such that k1(B) = k1(C) = 2, then k(A) = 4 by Theorem 2.2.6. Therefore
we have proved the sufficient condition for k(A) = 4. Next assume that k(A) = 4 and either Γ consists of exactly two (possibly degenerate) line segments such that the equalities k1(B) = k1(C) = 2 fail, or Γ consists of at most one (possibly degenerate)
line segment. Since property Λ holds in each case, we must have k1(B) = k1(C) = 2
by Theorem 2.2.6. This shows that we need only consider the latter. If Γ consists of exactly one (possibly degenerate) line segment, then Γ1 is empty and Γ2is a singleton.
Hence we may assume that W (B) is contained in W (C) and the intersection of W (B) and W (C) is Γ. This shows that k1(B) = 1 and k1(C) = 2, which is a contradiction.
If Γ is empty, then it follows from Proposition 2.3.3 that the numerical range of one summand, say, B is contained in the interior of the numerical range of the other summand C. By Corollary 2.3.4 and [5, Lemma 4.1], we see that k(A) = k(C) = 2, which is absurd. This completes the proof.
As a final application of Theorem 2.2.6, it is obvious that the convex hull of the union of W (A) and W (A + aIn) has property Λ for any a 6= 0. Hence we obtain the
following proposition.
Proposition 2.3.10. Let A be an n-by-n matrix and a be a nonzero complex num-ber. Then k(A ⊕ (A + aIn)) = k1(A) + k1(A + aIn). In this case, k(A ⊕ (A + aIn)) =
2k(A) if and only if k1(A + aIn) = k1(A) = k(A).
We conclude this paper by stating the following open questions concerning this topic. Is it true that the equality k(A) = k1(B) + k1(C) holds for a matrix A of the
form B ⊕C even if property Λ fails? We note that although property Λ fails, the men-tioned formula may still be correct (cf. Proposition 2.3.1). Another natural example of the failure of property Λ is that both W (B) and W (C) have the same numerical range. Is it true that k (B ⊕ C) = k(B) + k(C) in this case? In particular, can we determine the value of k (A ⊕ A) (cf. Proposition 2.3.10)? The following proposition gives a partial answer for k (A ⊕ A) if we assume, in addition, that dim Hξ = 1 for
all ξ ∈ ∂W (A).
Proposition 2.3.11. If A is an n-by-n matrix with dim Hξ = 1 for all ξ ∈
∂W (A), then k m M j=1 A ! = m · k (A) .
Proof. Obviously, the inequality k ⊕m
j=1A ≥ m · k (A) holds. To prove the
re-versed inequality, we consider, for convenience, the case m = 2. Let ξ1 ∈ ∂W (A ⊕ A).
Then dim Hξ1(A ⊕ A) = 2 by our assumption on Hξ(A). Hence the subspace Hξ1(A ⊕ A)
is spanned by the two unit vectors x1⊕ 0 and 0 ⊕ x1, where ξ1 = hAx1, x1i. Let z1
be a unit vector in Hξ1(A ⊕ A). Then z1 = (α1x1⊕ α2x1) /
q
|α1|2+ |α2|2, where α1
is spanned by the two unit vectors x2 ⊕ 0 and 0 ⊕ x2, where ξ2 = hAx2, x2i.
More-over, if z2 is a unit vector in Hξ2(A ⊕ A), then z2 = (β1x2⊕ β2x2) /
q
|β1|2+ |β2|2,
where β1 and β2 are in C. Obviously, the orthogonality of z1 and z2 is equivalent to
α1β¯1 + α2β¯2 hx1, x2i = 0, that is, * α1 α2 , β1 β2 + hx1, x2i = 0.
This shows that k(A ⊕ A) ≤ 2k(A) immediately by the definition of k(A). For general m, a similar argument as above yields that
* α1 ... αm , β1 ... βm + hx1, x2i = 0
for some scalars α1, ..., αm and β1, ..., βm, where x1 and x2 are similarly defined. Since
the dimension of Cm is m, the number of vectors of the form [α
1, ..., αm]T which are
orthogonal to each other is at most m. We infer from this and the above equality that the reversed inequality k ⊕m
j=1A ≤ m · k (A) holds. Therefore we have the asserted
equality.
At the end of this section, we apply Proposition 2.3.11 to the quadratic matrices. Recall that an n-by-n quadratic matrix A is unitarily similar to a matrix of the form
aIn1 ⊕ bIn2 ⊕ aIn3 D 0 bIn3 ,
where n1, n2, n3 ≥ 0, n1 + n2 + n3 = n, D > 0, and a, b ∈ σ (A) (cf. [18, Theorem
2.1]).
Corollary 2.3.12. If A is an n-by-n quadratic matrix of the above form and D is not missing, then k(A) = 2 · # ({λ ∈ σ (D) : λ = kDk}).
Proof. If D > 0, then D is unitarily similar to diag (d1, ..., dn3) , where d1 = · · · =
dp = kDk ≡ d > dp+1 ≥ · · · ≥ dn3 ≥ 0 (1 ≤ p ≤ n3). Hence A is unitarily similar to
a matrix of the form aIn1 ⊕ bIn2 ⊕
p j=1B ⊕nj=p+13 Bj, where n1+ n2+ 2n3 = n, B ≡ a d 0 b , and Bj ≡ a dj 0 b , j = p + 1, . . . , n3.
Since the set {a, b} and all of the numerical ranges W (Bj), j = p + 1, . . . , m, are
contained in the interior of W (B), it follows from [19, Lemma 2.9] that k(A) = k(⊕pj=1B). Since dim Hξ(B) = 1 for all ξ ∈ ∂W (B), we have k(A) = p · k(B) by
Proposition 2.3.11. Obviously, k(B) = 2 by [5, Lemma 4.1]. Thus k(A) = 2p as
asserted.
We remark that in the preceding proof the equality k(⊕pj=1B) = 2p can also be
established directly. Indeed, the inequality k(⊕pj=1B) ≥ 2p holds trivially and we can
3
Gau-Wu numbers of nonnegative matrices
3.1
Introduction
In Section 3.2 below, we first consider a matrix A of the form 0 A1 0 0 . .. . .. Am−1 Am 0 (m ≥ 2),
where the diagonal zeros are zero square matrices. In this case, we obtain that k(A) has a lower bound m (Proposition 3.2.1) if A has a boundary vector x = ⊕m
j=1xk, that
is, hAx, xi ∈ ∂W (A), with all component vectors xj having the same norm 1/√m.
Next, we study a nonnegative matrix A of the above form with irreducible real part and Am = 0. Proposition 3.2.3 yields that k(A) ≤ m − 1. Moreover, with the help
of [19], we are able to give necessary and sufficient conditions for such a matrix A with the value of k(A) equal to m − 1 (Theorem 3.2.4). Finally, we also consider a nonnegative matrix A of the above form with irreducible real part. Example 3.2.6 shows that no analogous results hold for such an A. In Section 3.3, we consider more special nonnegative matrices, namely, the doubly stochastic matrices. It can be proven that k(A) equals 3 for any 3-by-3 doubly stochastic matrix (Proposition 3.3.2). Moreover, for a 4-by-4 doubly stochastic matrix A, we determine the value of k(A) completely and give the description of its numerical range W (A) (Propositions 3.3.4 and 3.3.5). For general n, we obtain the lower bound of k(A) for an n-by-n doubly stochastic matrix A (Theorems 3.3.6 and 3.3.7). In particular, for an n-by-n irreducible doubly stochastic matrix A, we obtain a necessary and sufficient condition for k(A) to be equal to this lower bound (Theorem 3.3.7).
We end this section by fixing some notations. For any finite matrix A, its trace,
determinant, and spectral radius are denoted by tr A, det A, and r(A), respectively. The number m of eigenvalues z of A with |z| = r(A) is called the index of imprimitivity of A.
3.2
Nonnegative block shift matrix
We start by reviewing a couple of basic facts on a block shift matrix. Recall that a block shift matrix A is one of the form
01 A1 0 02 . .. . .. Am−1 Am 0m (m ≥ 2),
where the diagonal zeros 0j (j = 1, ..., m) are zero square matrices. Let ϕ = 2π/m.
Then it is easy to see that the numerical range W (A) is an m-symmetric compact convex region since U∗AU = eiϕA, where U is a unitary matrix of the form
eiϕI 1 0 0 e2iϕI 2 . .. . .. 0 0 emiϕI m ,
where the diagonal identity matrix Ij is of the same size as the corresponding 0j
(j = 1, ..., m). Let hAx, xi be a boundary point of W (A), where x = ⊕m
k=1xj is a unit
vector. We define x0φ = x and xjϕ = ⊕mk=1ei(k−1)jϕxk for j = 1, ..., m − 1. With these
notations, we can give a lower bound for k(A).
Proposition 3.2.1. Let A be a block shift of the above form with the corresponding notations as above. Then kxkk is equal to 1/√m for all k = 1, ..., m if and only if the vectors xpϕ, 0 ≤ p ≤ m − 1, are orthonormal. In this case, we have k(A) ≥ m .
Proof. Assume that hxpϕ, xqϕi = 0 for 0 ≤ p 6= q ≤ m − 1. This is equivalent to
the equation
kx1k2+ ei(p−q)ϕkx2k2 + · · · + ei(m−1)(p−q)ϕkxmk2 = 0
for 0 ≤ p 6= q ≤ m − 1. That is, eiϕ, ..., ei(m−1)ϕ are the roots of the polynomial
kx1k2+ kx2k2t + · · · + kxmk2tm−1.
Hence each kxkk is equal to 1/√m for k = 1, ..., m by comparing the coefficients of
the above polynomial with those of kxmk2Qm−1j=1 (t −eijϕ). Conversely, if kxkk is equal
to 1/√m for all k = 1, ..., m, then it is a routine matter to check that xpϕ and xqϕ
are orthonormal for 0 ≤ p 6= q ≤ m − 1. Clearly, in this case, k(A) has a lower bound
m.
Recall that the numerical radius ω(A) of a matrix A is the quantity max {|z| : z ∈ W (A)}. For a nonnegative matrix with irreducible real part, [16, Lemma 1] says that, for ω(A)eiθ in W (A), where θ is a real number with eiθ 6= 1, (a) if θ is an irrational
multiple of 2π, then A is permutationally similar to a matrix of the form
(1) 0 A1 0 0 . .. . .. Am−1 0 0 (m ≥ 2),
where the diagonal zeros are zero square matrices, and, in particular, W (A) is a circu-lar disc centered at the origin, and (b) if θ is a rational multiple of 2π, say, θ = 2πp/q, where p and q are relatively prime integers and q ≥ 2, then A is permutationally sim-ilar to (2) 0 A1 0 0 . .. . .. Aq−1 Aq 0 (m ≥ 2), 32
and, in particular, W (A) = e2πi/qW (A).
The following lemma is a generalization of [19, Lemma 3.6], which is useful for the proof of Proposition 3.2.3. Recall that a vector x with positive components, denoted by x ≻ 0, is called positive.
Lemma 3.2.2. Let A be an n-by-n (n ≥ 2) nonnegative matrix of the form (1)
with irreducible real part and m ≥ 2. Then the following hold:
(a) W (A) = {z ∈ C : |z| ≤ ω(A)}.
(b) There is a unique positive vector x = x1⊕ · · · ⊕ xm ∈ Cn such that hAx, xi =
ω(A).
(c) For any a = ω(A)eiθ, θ ∈ [0, 2π), in ∂W (A), if x
θ = x1⊕eiθx2⊕· · ·⊕ei(m−1)θxm, then a = hAxθ, xθi and Ha is generated by xθ.
(d) Let aj = ω(A)eiθj (θj ∈ [0, 2π)), j = 1, 2, be two points in ∂W (A) with the corresponding unit vector xθj. Then xθ1 and xθ2 are orthogonal to each other if and
only if ei(θ1−θ2) is a zero of the polynomial kx
1k2+ kx2k2t + · · · + kxmk2tm−1.
Proof. Since U∗
θAUθ = eiθA for any θ, where Uθ = ⊕mk=1ei(k−1)θIk, that is, A is
unitarily similar to eiθA for any θ, (a) follows immediately. (b) is a consequence of
[11, Proposition 3.3]. To prove (c), note that
a = ω(A)eiθ = heiθAx, xi = hUθ∗AUθx, xi = hA(Uθx), (Uθx)i = hAxθ, xθi,
which shows that xθ is in Ha. That dim Ha = 1 is by [11, Corollary 3.10]. Thus Ha
is generated by xθ. (d) follows from the fact that hxθ1, xθ2i =
Pm
k=1ei(k−1)(θ1−θ2)x2k.
This completes the proof.
Thus, for a nonnegative matrix A of the form (1) with irreducible real part, k(A) equals the maximum number of θ1, ..., θk in [0, 2π) for which ei(θj−θl) is a zero of
p(t) ≡ kx1k2 + kx2k2t + · · · + kxmk2tm−1 for all j and l, 1 ≤ j 6= l ≤ k. If m = 2,