Throughout this paper, we assume that V is a Euclidean Jordan algebra with an identity element e

(1)

(2)

Figure 1. Relationship among various distances.

(ii) x◦ (x²◦ y) = x²◦ (x ◦ y) for all x, y ∈ V where x²:= x◦ x;

(iii) hx ◦ y, zi = hx, y ◦ zi for all x, y, z ∈ V.

Here x◦ y is called the Jordan product of x and y. If a Jordan product only satisﬁes the conditions (i) and (ii) in the above deﬁnition, the algebra V is said to be a Jordan algebra. Moreover, if there is an (unique) element e∈ V such that x ◦ e = x for all x∈ V, the element e is called the identity element in V. Note that a Jordan algebra does not necessarily have an identity element. Throughout this paper, we assume that V is a Euclidean Jordan algebra with an identity element e.

In a given Euclidean Jordan algebra V, the set of squares K := {x²| x ∈ V}

is a symmetric cone [10, Theorem III.2.1]. This means that K is a self-dual closed convex cone and, for any two elements x, y∈ int(K), there exists an invertible linear transformation Γ :V → V such that Γ(x) = y and Γ(K) = K. It is well known that second-order cone is a special symmetric cone, which is deﬁned as follows in Rⁿ:

Kⁿ:=

x = (x₀, ¯x)∈ R × Rⁿ⁻¹ | x0 ≥ k¯xk ,

and the corresponding Jordan product of x and y in Rⁿ with x = (x₀, ¯x), y = (y0, ¯y)∈ R × Rⁿ⁻¹ is given by

x◦ y :=

x^Ty x0y + y¯ 0x¯

.

In particular, in the setting of the second-order cone Kⁿ, the identity element e = (1, 0)∈ R × Rⁿ⁻¹, where 0 denotes the zero vector in Rⁿ⁻¹.

For x∈ V, we denote m(x) the degree of the minimal polynomial of x, that is, m(x) := min

n

k > 0| {e, x, . . . , x^k} is linearly dependento ,

and the rank of V is well-deﬁned by r := max{m(x) | x ∈ V}. In Euclidean Jordan algebra V, an element eⁱ ∈ V is an idempotent if (eⁱ)² = eⁱ, and it is a primi- tive idempotent if it is nonzero and cannot be written as a sum of two nonzero idempotents. The idempotents eⁱ and e^j are said to be orthogonal if eⁱ◦ e^j = 0. In

(3)

addition, we say that a ﬁnite set {e¹, e², . . . , e^r} of primitive idempotents in V is a Jordan frame if

eⁱ◦ e^j = 0 for i6= j, and Xr i=1

eⁱ = e.

Note that heⁱ, e^ji = heⁱ◦ e^j, ei whenever i 6= j.

With the above, there have the spectral decomposition and Peirce decomposition of an element x in V.

Theorem 1.1 ((The Spectral Decomposition Theorem) [10, Theorem III.1.2]). Let V be a Euclidean Jordan algebra. Then, there is a number r such that, for every x ∈ V, there exists a Jordan frame {e¹, . . . , e^r} and real numbers λ1(x), . . . , λr(x) with

x = λ1(x)e¹+· · · + λr(x)e^r.

Here, the numbers λi(x) (i = 1, . . . , r) are the eigenvalues of x, the expression λ₁(x)e¹ + · · · + λr(x)e^r is the spectral decomposition of x. Moreover, tr(x) :=

P_r

i=1λi(x) is called the trace of x, and det(x) := λ1(x) . . . λr(x).

We point out that different elements x, y have their own Jordan frames in the spectral decomposition, which are not easy to handle when we need to do operations for x and y. Thus, we need another so-called Peirce decomposition to conquer such difficulty. In other words, in the Peirce decomposition, two different elements x, y share the same Jordan frame. We elaborate them more as below.

The Peirce decomposition: Fix a Jordan frame {e¹, e², . . . , e^r} in a Euclidean Jordan algebra V. For i, j ∈ {1, 2, . . . , r}, we deﬁne the following eigen-spaces

Vii:={x ∈ V | x ◦ eⁱ = x} = Reⁱ and

Vij :=

x∈ Vx◦ eⁱ= 1

2x = x◦ e^j

for i6= j.

Theorem 1.2 ([10, Theorem IV.2.1]). The space V is the orthogonal direct sum of spaces Vij(i≤ j). Furthermore,

Vij ◦ Vij ⊂ Vii+Vjj, Vij ◦ Vjk ⊂ Vik, if i6= k,

Vij ◦ Vkl ={0}, if {i, j} ∩ {k, l} = ∅.

Hence, given any Jordan frame {e¹, e², . . . , e^r}, we can write any element x ∈ V as x =

Xr i=1

x_ieⁱ+X

i<j

x_ij,

where x_i ∈ R and xij ∈ Vij. The expression P_r

i=1x_ieⁱ +P

i<jx_ij is called the Peirce decomposition of x.

(4)

Theorem 1.3 ([22, Theorem 4.6]). Suppose that V is simple and {e¹, . . . , e^r} is any ﬁxed Jordan frame in V. Let z =P_r

i=1z_ieⁱ+P

i<jz_ij ∈ K. Then, we have Xr

i=1

z_i^p≤ tr(z^p) for p > 1 and Xr i=1

z^p_i ≥ tr(z^p) for 0 < p < 1, where the equalities hold if and only if z =P_r

i=1z_ieⁱ.

In a Euclidean Jordan algebrasV, for any x ∈ V, a linear transformation L(x) : V → V is called Lyapunov transformation, which is deﬁned as L(x)(y) := x ◦ y for all y ∈ V. The so-called quadratic representation P (x) is deﬁne by P (x) :=

2L²(x)− L(x²). For any x∈ V, the endomorphisms L(x) and P (x) are self-adjoint.

We say that two elements x and y of a Euclidean Jordan algebraV operator commute if x◦(y ◦z) = y ◦(x◦z) for all z ∈ V, which is equivalent to stating that L(x)L(y) = L(y)L(x). For the quadratic representation P (x), if x is invertible, then we have

P (x)K = K and P (x)int(K) = int(K).

Below is a useful property regarding the quadratic representation P (x), which is needed for our subsequent analysis.

Theorem 1.4 ([12, Proposition 2.5]). Suppose that{e¹, e², . . . , e^r} is Jordan frame in V and the spectral decomposition of x can be expressed as x = λ1(x)e¹ +· · · + λr(x)e^r. For any z ∈ V, if the Peirce decomposition of z is z =P_r

i=1zieⁱ+P

i<jzij, we have

P (x)z = Xr

i=1

λ_i(x)²z_ieⁱ+X

i<j

λ_i(x)λ_j(x)z_ij.

In light of the trace function tr(·), a semi-distance function associated with sym- metric cone was proposed in [16]:

(1.3) d(x, y) := tr(x + y)− 2 tr

P (x¹²)y

¹

2 , for x, y∈ K.

When the symmetric cone reduces to the second-order coneKⁿ, the function d(x, y) was modiﬁed a bit as below distance function (1.4) and is further proved a proximal distance in [16]. When the symmetric cone reduces to the semi-deﬁnite positive matrix cone, the function d(x, y) corresponds to the matrix distance proposed by Givens and Shortt [11]:

d(A, B) := tr(A + B)− 2 tr

A¹²BA¹²

¹

2 .

Theorem 1.5 ( [16, Theorem 2.3]). Let d(·, ·) be deﬁned as in (1.3). For any x, y ∈ K, assume that x and y operator commute. Then, the function d(·, ·) is a semi-distance, i.e.,

(a) d(x, y)≥ 0;

(b) d(x, y) = 0 if and only if x = y;

(c) d(x, y) = d(y, x).

(5)

As mentioned earlier, the function d(x, y) was modiﬁed a bit in the setting of second-order cone in [16], which could become a proximal distance. In particular, for any x, y∈ Rⁿ, there deﬁnes d :Rⁿ× Rⁿ→ R+∪ {+∞} by

(1.4) d(x, y) :=



 tr(x + y)− 2 tr

P (x¹²)y

¹

2 ∀x ∈ int(Kⁿ), y∈ Kⁿ,

+∞ otherwise.

This modiﬁed function d(x, y) is a proximal distance on int(Kⁿ), see [16, Theorem 3.7].

Theorem 1.6 ( [16, Theorem 3.7]). Let the function d(·, ·) be deﬁned by (1.4).

Then, the function d(·, ·) is a proximal distance with respect to int(Kⁿ), i.e., (a) d(·, y) is proper, l.s.c., convex, continuously diﬀerentiable on int(Kⁿ);

(b) dom d(·, y) ⊂ int(Kⁿ) and dom ∂₁d(·, y) = int(Kⁿ), where the symbol ∂₁d(·, y) denotes the classical subgradient map of the function d(·, y) with respect to the ﬁrst variable;

(c) d(·, y) is level bounded on Rⁿ i.e., lim_{∥u∥→∞}d(u, y) = +∞;

(d) d(y, y) = 0.

In this short paper, we improve these two results by showing that without assuming operator commute, the function d(·, ·) is a semi-distance, and the function d(·, ·) is a proximal distance in the setting of symmetric cone. These generaliza- tions enable them applicable to proximal-like algorithm for nonlinear symmetric cone programming.

2. Main results

In this section, without assuming operator commute, we show our main results.

Indeed, there exists a diﬃculty that the same Jordan frame is not available for any two elements x and y in V, when there is no condition of operator commute. Our novel idea to tackle with it is using the spectral decomposition of x, whereas employ- ing the Peirce decomposition of y. These together with the quadratic representation P (x) paves a way to do the analysis.

Theorem 2.1. Let d(·, ·) be deﬁned as in (1.3). For any x, y ∈ K, the function d(·, ·) is a semi-distance, i.e., there hold

(a) d(x, y)≥ 0;

(b) d(x, y) = 0 if and only if x = y;

(c) d(x, y) = d(y, x).

Proof. (a) Suppose that{e¹, e², . . . , e^r} is a Jordan frame in V. With this, we write out the spectral decomposition of x and the Peirce decomposition of y, respectively, as below:

x = λ₁(x) e¹+· · · + λr(x) e^r, y = y1e¹+· · · + yre^r+X

i<j

yij.

(6)

Based on the spectral decomposition of x, it follows that x¹² =p

λ1(x) e¹+· · · +p

λr(x) e^r. Combining with Theorem 1.5, this implies that

P (x¹²)y = λ1(x)y1e¹+· · · + λr(x)yre^r+X

i<j

q

λi(x)λj(x) yij.

Then, applying Theorem 1.3, we have tr

P (x¹²)y

¹

2 ≤

Xr i=1

pλi(x)yi.

According to this, for any x, y∈ K, we achieve d(x, y) = tr(x + y)− 2 tr

P (x¹²)y

¹

2

≥ tr(x) + tr(y) − 2 Xr i=1

pλi(x)yi

≥ Xr i=1

λ_i(x) + Xr i=1

y_i− 2 Xr

i=1

pλ_i(x)y_i

= Xr i=1

pλ_i(x)−√ y_i

2

≥ 0,

where the second inequality follows from [12, Corollary 4.6]. Hence, we prove that d(x, y)≥ 0.

(b) From the proof of part (a), we know that d(x, y) = tr(x + y)− 2 tr

P (x¹²)y

¹

2 ≥ tr(x) + tr(y) − 2 Xr

i=1

pλ_i(x)y_i

≥ Xr

i=1

pλi(x)−√ yi

2

≥ 0.

Hence, it follows from d(x, y) = 0 that tr(x + y)− 2 tr

P (x¹²)y

¹

2 = tr(x) + tr(y)− 2 Xr i=1

pλ_i(x)y_i

and Xr

i=1

pλ_i(x)−√ y_i

2

= 0.

These lead that tr

P (x¹²)y

¹

2 =

Xr i=1

pλ_i(x)y_i and p

λ_i(x) =√

y_i ∀i = 1, . . . , r.

(7)

In addition, applying Theorem 1.3 yields x =

Xr i=1

λ_i(x)eⁱ = Xr

i=1

y_ieⁱ = y.

Therefore, it is clear to see that d(x, y) = 0 if and only if x = y.

(c) First, from [14, Proposition 3.2], for any x, y∈ K, we have λi

P (x¹²)y

= λi

P (y¹²)x

for i = 1, . . . , r. This leads to λ_i

P (x¹²)y

¹

2 = λ_i

P (y¹²)x

¹

2 ∀i = 1, . . . , r.

Hence, it follows that tr

P (x¹²)y

¹

2 = tr

P (y¹²)x

¹

2, which implies that d(x, y) = tr(x + y)− 2 tr

P (x¹²)y

¹

2

= tr(y + x)− 2 tr

P (y¹²)x

¹

2 = d(y, x).

Then, the proof is complete. □

Theorem 2.2. Let d(·, ·) be deﬁned as in (1.3). Then, the function d(x, y) is convex, for any a ﬁxed x∈ K or y ∈ K.

Proof. The proof is the similar to [16, Theorem 2.4]. Hence, we omit it. □ From Theorem 2.1 and Theorem 2.2, we have shown that the function d(·, ·) deﬁned as in (1.3) is a convex semi-distance associated with symmetric cone. How- ever, as indicated in [16], it can be veriﬁed by using the convexity of d(·, ·) that the triangle inequality fails. To see this, for given any x, y ∈ K, taking z = λx+(1−λ)y and 0 < λ < 1, there have

d(x, z) = d (x, λx + (1− λ)y) (2.1)

≤ λd(x, x) + (1 − λ)d(x, y) = (1 − λ)d(x, y), d(z, y) = d (λx + (1− λ)y, y)

(2.2)

≤ λd(x, y) + (1 − λ)d(y, y) = λd(x, y).

Then, adding (2.1) and (2.2) together yields

d(x, z) + d(z, y)≤ d(x, y).

In other words, the semi-distance d(·, ·) deﬁned as in (1.3) could not become a

“distance function” (metric function). Thus, we turn our attention to the possibility of d(·, ·) becoming a proximal distance.

In order to prove d(·, ·) could become a proximal distance, we need to modify it a bit. For any x, y ∈ Rⁿ, we deﬁne d :Rⁿ× Rⁿ→ R+∪ {+∞} by

(2.3) d(x, y) :=



 tr(x + y)− 2 tr

P (x¹²)y

¹

2 ∀x ∈ intK, y ∈ K,

+∞ otherwise.

(8)

The above function d(·, ·) is diﬀerent from the ones given in [1]. To our best knowl- edge, it may be the only proximal distance which is not induced from Bregman distance or φ-divergence. This function, as will be shown below, is a proximal distance on intK.

Theorem 2.3. Let d(·, ·) be deﬁned as in (2.3) in the setting of symmetric cone.

Then, the function d(·, ·) is a proximal distance, i.e., it satisﬁes

(a) d(·, y) is proper, l.s.c., convex, continuously diﬀerentiable on intK;

(b) dom d(·, y) ⊂ intK and dom ∂1d(·, y) = intK, where ∂1d(·, y) denotes the classical subgradient map of the function d(·, y) with respect to the ﬁrst vari- able;

(c) d(·, y) is level bounded on Rⁿ i.e., lim_{∥u∥→∞}d(u, y) = +∞;

(d) d(y, y) = 0.

Proof. (a) The proof is similar to [5, Lemma 3.1], we omit the details.

(b) The arguments are similar to [16, Proposition 3.5], due to only the general cone structure is used. We also omit them.

(c) Suppose y∈ intK. For any x ∈ intK, as what we do in Theorem 2.1, we write out the spectral decomposition of x and the Peirce decomposition of y, respectively, i.e.,

x = λ1(x) e¹+· · · + λr(x) e^r and y = y1e¹+· · · + yre^r+X

i<j

yij.

Note that kxk² = λ1(x)²ke¹k² + · · · + λr(x)²ke^rk² ≤ rλ1(x)²ke¹k², where the inequality holds because keⁱk is a constant on V for any primitive idempotent eⁱ (i = 1, . . . , r) and λ₁(x) ≥ · · · ≥ λr(x) ≥ 0. Hence, it is easy to check that λ1(x)→ ∞ as kxk → ∞. From this and the proof of part (a) in Theorem 2.1, we have

d(x, y)≥ Xr

i=1

pλ_i(x)−√ µ_i

2

≥p

λ₁(x)−√ µ₁

2

→ ∞.

It follows that d(x, y)→ ∞ as kxk → ∞ for any x ∈ intK. Moreover, d(x, y) = ∞ when x /∈ intK. Then, we prove that d(x, y) → ∞ as kxk → ∞ for any x ∈ Rⁿ. Thus, we conclude that d(·, y) is level bounded on Rⁿ.

(d) This property is trivial.

To sum up, the function d(·, ·) deﬁned as in (2.3) is a proximal distance in the

setting of symmetric cone. □

Remark 2.4. We say a few words about Theorem 2.1 and Theorem 2.3. In fact, when the symmetric cone K reduces to the second-order cone Kⁿ, the conclusions of Theorem 2.1 and Theorem 2.3 correspond to the contents of Theorem 2.5 and theorem 3.7 in [16], respectively. In other words, our results are generalizations of Theorem 2.5 and theorem 3.7 in [16] in a broader framework.

(9)

3. Concluding remarks

In this paper, we study a semi-distance associated with symmetric cone K. Fur- thermore, based on it, we construct a proximal distance on intK, which also answers a question raised in [16]. Again, we would like to point out some possible future directions as mentioned in [16].

• Can the function d(·, ·) further become a Bregman distance or φ-divergence?

• Can the function d(·, ·) be extended to nonsymmetric cone setting? In particular, for circular coneLθ, we have already known one type of spectral decomposition of x and some diﬀerentiabilities of λi(x), see [24]. By using these facts, we may consider to construct an analogous distance function d(·, ·) in the setting of circular cone.

References

[1] A. Auslender and M. Teboulle, Interior gradient and proximal methods for convex and conic optimization, SIAM J. Optim. 16 (2006), 697–725.

[2] A. Banerjee, I. Dhillon and J. Ghosh, Clustering with Bregman divergences, J. Mach. Learn.

Res. 6 (2005), 1705–1749.

[3] A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu and D. Modha, A generalized maximum Entropy approach to Bregman co-clustering and matrix approximations, J. Mach. Learn. Res. 8 (2007), 1919–1986.

[4] Y. Censor and A. Zenios, Proximal minimization algorithms with D-functions, J. Optim. The- ory Appl. 73 (1992), 451–464.

[5] Y.-L. Chang and J.-S. Chen, Convexity of symmetric cone trace functions in Euclidean Jordan algebras, J. Nonlinear Convex Anal. 14 (2013), 53–61.

[6] J.-S. Chen and S. Pan, Proximal-like algorithm using the quasi D-function for convex second- order cone programming, J. Optim. Theory Appl. 138 (2008), 95–113.

[7] G. Chen and M. Teboulle, Convergence analysis of proximal-like minimization algorithm using Bregman functions, SIAM J. Optim. 3 (1993), 538–543.

[8] I. Dhillon and S. Sra, Generalized nonnegative matrix approximations with Bregman diver- gences, in: NIPS’05: Proceedings of the 18th International Conferencen on Neural Information Processing Systems, 2005, pp. 283–290.

[9] S. Dhillon and J.A. Tropp, Matrix nearness problems with Bregman divergences, SIAM J.

Matrix Anal. Appl. 29 (2007), 1120–1146.

[10] J. Faraut and A. Kor´anyi, Analysis on Symmetric Cones, Oxford Mathematical Monographs Oxford University Press, New York, 1994.

[11] C. R. Givens and R. M. Shortt, A class of Wasserstein metrics for probability distributions, Mich. Math. J. 31 (1984), 231–240.

[12] M. S. Gowda and J. Tao, The Cauchy interlacing theorem in simple Euclidean Jordan algebras and some consequences, Linear Multilinear Algebra, 59 (2011), 65–86.

[13] K. C. Kiwiel, Proximal minimization methods with generalized Bregman functions, SIAM J.

Control Optim. 35 (1997), 1142–1168.

[14] J. Kim and Y. Lim, Jordan automorphic generators of Euclidean Jordan algebras, J. Korean Math. Soc. 43 (2006), 507–528.

[15] B. Kulis, M. Sustik and I. Dhillon, Low-Rank Kernel Learning with Bregman Matrix Diver- gences, J. Mach. Learn. Res. 10 (2009), 341–376.

[16] X.-H. Miao, C.-H. Huang, Y. Lim and J.-S. Chen, A semi-distance associated with symmetric cone and a new proximal distance function on second-order cone, Linear and Nonlinear Anal.

5 (2019), 421–437.

[17] S. Pan and J.-S. Chen, A class of interior proximal-like algorithms for convex second-order cone programming, SIAM J. Optim. 19 (2008), 883–910.

(10)

[18] E. A. Papa Quiroz, An extension of the proximal point algorithm with Bregman distances on Hadamard manifolds, J. Global Optim. 56 (2013), 43–59.

[19] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, New Jersey, 1970.

[20] W. Stummer and I. Vajda, On Bregman distances and divergences of probability measures, IEEE Trans. Inf. Theory. 58 (2012), 1277–1288.

[21] M. Teboulle, Entropic proximal mappings with applications to nonlinear programming, Math.

Oper. Res. 17 (1992), 670–690.

[22] J. Tao, L. C. Kong, Z. Y. Luo and N. H. Xiu, Some majorization inequalities in Euclidean Jordan algebras, Linear Algebra Appl. 461 (2014), 92–122.

[23] K. Tsuda, G. Ratsch and M. K. Warmuth, Matrix exponentiated gradient updates for on-line learning and Bregman projection, J. Mach. Learn. Res. 6 (2005), 995–1018.

[24] J. C. Zhou, J. Y. Tang and J.-S. Chen, Parabolic second-order directional diﬀerentiability in the Hadamard sense of the vector-valued functions associated with circular cones, J. Optim.

Theory Appl. 172 (2017), 802–823.

Manuscript received December 25, 2020 revised December 19, 2021

X. Miao

School of Mathematics, Tianjin University, Tianjin 300072, P.R. China E-mail address: [email protected]

J.-S. Chen

Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan E-mail address: [email protected]