Editing Matrix norm

{{Short description|Norm on a vector space of matrices}}
{{For|the general concept|Norm (mathematics)}}
{{multiple issues|
{{lead too short|date=March 2023}}
{{context|date=March 2023}}}}

In the field of [[mathematics]], [[vector norm|norms]] are defined for elements within a [[vector space]]. Specifically, when the vector space comprises matrices, such norms are referred to as '''matrix norms'''. Matrix norms differ from vector norms in that they must also interact with matrix multiplication.

==Preliminaries==
Given a [[field (mathematics)|field]] <math>\ K\ </math> of either [[real number|real]] or [[complex number]]s (or any complete subset thereof), let <math>\ K^{m \times n}\ </math> be the {{mvar|K}}-[[vector space]] of matrices with <math>m</math> rows and <math>n</math> columns and entries in the field <math>\ K ~.</math> A matrix norm is a [[Norm (mathematics)|norm]] on <math>\ K^{m \times n}~.</math>

Norms are often expressed with [[double vertical bar]]s (like so: <math>\ \|A\|\ </math>).  Thus, the matrix norm is a [[Function (mathematics)|function]] <math>\ \|\cdot\| : K^{m \times n} \to \R^{0+}\ </math> that must satisfy the following properties:<ref name=":0">{{cite web |last=Weisstein |first=Eric W. |title=Matrix norm |website=mathworld.wolfram.com |url=https://mathworld.wolfram.com/MatrixNorm.html |access-date=2020-08-24 |lang=en }}</ref><ref name=":1">{{Cite web |title=Matrix norms |website=fourier.eng.hmc.edu |url=http://fourier.eng.hmc.edu/e161/lectures/algebra/node12.html | access-date=2020-08-24 }}</ref>

For all scalars <math>\ \alpha \in K\ </math> and matrices <math>\ A, B \in K^{m \times n}\ ,</math>
* <math> \|A\| \ge 0\ </math> (''positive-valued'')
* <math> \|A\| = 0 \iff A=0_{m,n}</math> (''definite'')
* <math> \left\| \alpha\ A \right\| = \left| \alpha \right|\ \left\|A\right\|\ </math> (''absolutely homogeneous'')
* <math> \| A + B \| \le \| A \| + \| B \|\ </math> (''sub-additive'' or satisfying the ''triangle inequality'')

The only feature distinguishing matrices from rearranged vectors is [[matrix multiplication|multiplication]]. Matrix norms are particularly useful if they are also '''sub-multiplicative''':<ref name=":0"/><ref name=":1"/><ref>{{cite journal |last=Malek-Shahmirzadi |first=Massoud |year=1983 |title=A characterization of certain classes of matrix norms |journal=Linear and Multilinear Algebra |volume=13 |issue=2 |pages=97–99 | doi=10.1080/03081088308817508| issn=0308-1087 |lang=en }}</ref>

* <math>\ \left\| AB \right\| \le \left\| A \right\| \left\| B \right\|\ </math>{{efn|group=Note|
The condition only applies when the product is defined, such as the case of [[Square matrix|square matrices]] (<math>\ m = n\ </math>). More generally, multiplication of the matrices must be possible: <math>\ A \in K^{\ell \times m}\ </math> and <math>\ B \in K^{m \times n} ~;</math> further, the two norms <math>\ \|A\|\ </math> and <math>\ \|B\|\ </math> must either have the same definitions, only differing in the matrix dimensions, or two different types of norms that are none the less "consistent" (see below).
}}

Every norm on <math>\ K^{n\times n}\ </math> can be rescaled to be sub-multiplicative; in some books, the terminology ''matrix norm'' is reserved for sub-multiplicative norms.<ref>{{cite book |last=Horn |first=Roger A. |year=2012 |title=Matrix analysis |edition=2nd |publisher=Cambridge University Press |location=Cambridge, UK |others=Johnson, Charles R. |isbn=978-1-139-77600-4 |oclc=817236655 |pages=340–341 }}</ref>

==Matrix norms induced by vector norms==
{{Main|Operator norm}}
Suppose a [[vector norm]] <math>\|\cdot\|_{\alpha}</math> on <math>K^n</math> and a vector norm <math>\|\cdot\|_{\beta}</math> on <math>K^m</math> are given. Any <math>m \times n</math> matrix {{mvar|A}} induces a linear operator from <math>K^n</math> to <math>K^m</math> with respect to the standard basis, and one defines the corresponding ''induced norm'' or ''[[operator norm]]'' or ''subordinate norm'' on the space <math>K^{m \times n}</math> of all <math>m \times n</math> matrices as follows:
<math display="block">
\|A\|_{\alpha, \beta} = \sup\{ \|Ax\|_\beta : x \in K^n \text{ such that } \|x\|_\alpha \leq 1 \}
</math>
where <math> \sup </math> denotes the [[Infimum and supremum|supremum]]. This norm measures how much the mapping induced by <math>A</math> can stretch vectors.
Depending on the vector norms <math>\|\cdot\|_{\alpha}</math>, <math>\|\cdot\|_{\beta}</math> used, notation other than <math>\|\cdot\|_{\alpha,\beta}</math> can be used for the operator norm.

===Matrix norms induced by vector ''p''-norms===
If the [[Vector norm#p-norm|''p''-norm for vectors]] (<math>1 \leq p \leq \infty</math>) is used for both spaces <math>K^n</math> and <math>K^m,</math> then the corresponding operator norm is:<ref name=":1" />
<math display="block">
\|A\|_p = \sup \{ \|Ax\|_p : x \in K^n \text{ such that } \|x\|_p \leq 1 \}.
</math>
These induced norms are different from the [[#"Entry-wise" matrix norms|"entry-wise"]] ''p''-norms and the [[Schatten norm|Schatten ''p''-norms]] for matrices treated below, which are also usually denoted by <math> \|A\|_p .</math>

Geometrically speaking, one can imagine a ''p''-norm unit ball <math>V_{p, n} = \{x\in K^n : \|x\|_p \le 1 \}</math> in <math>K^n</math>, then apply the linear map <math>A</math> to the ball. It would end up becoming a distorted convex shape <math>AV_{p, n} \subset K^m</math>, and <math> \|A\|_p </math> measures the longest "radius" of the distorted convex shape. In other words, we must take a ''p''-norm unit ball <math>V_{p, m}</math> in <math>K^m</math>, then multiply it by at least <math> \|A\|_p </math>, in order for it to be large enough to contain <math>AV_{p, n}</math>.

==== ''p'' = 1 or ∞ ====

When <math>\ p = 1\ ,</math> or <math>\ p = \infty\ ,</math> we have simple formulas.
:<math display="block"> \|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^m \left| a_{ij} \right|\ ,</math>
which is simply the maximum absolute column sum of the matrix.
<math display="block"> \|A\|_\infty = \max_{1 \leq i \leq m} \sum _{j=1}^n \left| a_{ij} \right|\ ,</math>
which is simply the maximum absolute row sum of the matrix.

For example, for
<math display="block">A = \begin{bmatrix} -3 & 5 & 7 \\ ~~2 & 6 & 4 \\ ~~0 & 2 & 8 \\ \end{bmatrix}\ ,</math>
we have that
<math display="block">\|A\|_1 = \max\bigl\{\ |{-3}|+2+0\ ,~ 5+6+2\ ,~ 7+4+8\ \bigr\} = \max\bigl\{\ 5\ ,~ 13\ ,~ 19\ \bigr\} = 19\ ,</math>
<math display="block">\|A\|_\infty = \max\bigl\{\ |{-3}|+5+7\ ,~ 2+6+4\ ,~ 0+2+8\ \bigr\} = \max\bigl\{\ 15\ ,~ 12\ ,~ 10\ \bigr\} = 15 ~.</math>

==== Spectral norm (''p'' = 2) ====
{{anchor|Spectral norm}}
When <math>p = 2</math> (the [[Euclidean norm]] or <math>\ell_2</math>-norm for vectors), the induced matrix norm is the ''spectral norm''. The two values do ''not'' coincide in infinite dimensions &mdash; see [[Spectral radius]] for further discussion. The spectral radius should not be confused with the spectral norm. The spectral norm of a matrix <math>A</math> is the largest [[singular value]] of <math>A</math>, i.e., the square root of the largest [[eigenvalue]] of the matrix <math>A^*A,</math> where <math>A^*</math> denotes the [[conjugate transpose]] of <math>A</math>:<ref>Carl D. Meyer, Matrix Analysis and Applied Linear Algebra, §5.2, p.281, Society for Industrial & Applied Mathematics, June 2000.</ref><math display="block"> \|A\|_2 = \sqrt{\lambda_{\max}\left(A^* A\right)} = \sigma_{\max}(A).</math>where <math>\sigma_{\max}(A)</math> represents the largest singular value of matrix <math>A.</math>

There are further properties:
* <math display="inline">\|A \|_2 = \sup\{x^* A y : x \in K^m, y \in K^n \text{ with }\|x\|_2 = \|y\|_2 = 1\}.</math> Proved by the [[Cauchy–Schwarz inequality]].
* <math display="inline"> \| A^* A\|_2 = \| A A^* \|_2 = \|A\|_2^2</math>. Proven by [[singular value decomposition]] (SVD) on <math>A</math>.
* <math display="inline"> \|A\| _2 = \sigma_{\mathrm{max}}(A) \leq \|A\|_{\rm F} = \sqrt{\sum_i \sigma_{i}(A)^2}</math>, where <math>\|A\|_\textrm{F}</math> is the [[#Frobenius norm|Frobenius norm]]. Equality holds if and only if the matrix <math>A</math> is a rank-one matrix or a zero matrix.
* Conversely, <math>\|A\|_\textrm{F} \leq \min(m,n)^{1/2}\|A\|_2</math>.

* <math> \|A\|_2 = \sqrt{\rho(A^{*}A)}\leq\sqrt{\|A^{*}A\|_\infty}\leq\sqrt{\|A\|_1\|A\|_\infty} </math>.

===Matrix norms induced by vector ''α''- and ''β''-norms===
We can generalize the above definition. Suppose we have vector norms <math>\|\cdot\|_{\alpha}</math> and <math>\|\cdot\|_{\beta}</math> for spaces <math>K^n</math> and <math>K^m</math> respectively; the corresponding operator norm is
<math display="block">
\|A\|_{\alpha, \beta} = \sup\{ \|Ax\|_\beta : x \in K^n \text{ such that } \|x\|_\alpha \leq 1 \}
</math>
In particular, the <math>\|A\|_{p}</math> defined previously is the special case of <math>\|A\|_{p, p}</math>.

In the special cases of <math>\alpha = 2</math> and <math>\beta=\infty</math>, the induced matrix norms can be computed by<math display="block"> \|A\|_{2,\infty}= \max_{1\le i\le m}\|A_{i:}\|_2, </math> where <math>A_{i:}</math> is the i-th row of matrix <math> A </math>.

In the special cases of <math>\alpha = 1</math> and <math>\beta=2</math>, the induced matrix norms can be computed by<math display="block"> \|A\|_{1, 2} = \max_{1\le j\le n}\|A_{:j}\|_2, </math> where <math>A_{:j}</math> is the j-th column of matrix <math> A </math>.

Hence, <math> \|A\|_{2,\infty} </math> and <math> \|A\|_{1, 2} </math> are the maximum row and column 2-norm of the matrix, respectively.

===Properties===

Any operator norm is [[#Consistent and compatible norms|consistent]]  with the vector norms that induce it, giving
<math display="block">\|Ax\|_\beta \leq \|A\|_{\alpha,\beta}\|x\|_\alpha.</math>

Suppose <math>\|\cdot\|_{\alpha,\beta}</math>; <math>\|\cdot\|_{\beta,\gamma}</math>; and <math>\|\cdot\|_{\alpha,\gamma}</math> are operator norms induced by the respective pairs of vector norms <math>(\|\cdot\|_\alpha, \|\cdot\|_\beta)</math>; <math>(\|\cdot\|_\beta, \|\cdot\|_{\gamma})</math>; and <math>(\|\cdot\|_\alpha, \|\cdot\|_\gamma)</math>.  Then,
:<math>\|AB\|_{\alpha,\gamma} \leq \|A\|_{\beta, \gamma} \|B\|_{\alpha, \beta} ;</math>
this follows from
<math display="block">\|ABx\|_\gamma \leq \|A\|_{\beta, \gamma} \|Bx\|_\beta \leq \|A\|_{\beta, \gamma} \|B\|_{\alpha, \beta} \|x\|_\alpha </math>
and
<math display="block">\sup_{\|x\|_\alpha = 1} \|ABx \|_\gamma = \|AB\|_{\alpha, \gamma} .</math>

===Square matrices===
Suppose <math>\|\cdot\|_{\alpha, \alpha}</math> is an operator norm on the space of square matrices <math>K^{n \times n}</math>
induced by vector norms <math>\|\cdot\|_{\alpha}</math> and <math>\|\cdot\|_\alpha</math>.
Then, the operator norm is a sub-multiplicative matrix norm: 
<math display="block">\|AB\|_{\alpha, \alpha} \leq \|A\|_{\alpha, \alpha} \|B\|_{\alpha, \alpha}.</math>

Moreover, any such norm satisfies the inequality
{{NumBlk||<math display="block">(\|A^r\|_{\alpha, \alpha})^{1/r} \ge \rho(A) </math>  | {{EquationRef|1}}}}
for all positive integers ''r'', where {{math|''ρ''(''A'')}} is the [[spectral radius]] of {{mvar|A}}. For [[Symmetric matrix|symmetric]] or [[Hermitian matrix|hermitian]] {{mvar|A}}, we have equality in ({{EquationNote|1}}) for the 2-norm, since in this case the 2-norm ''is'' precisely the spectral radius of {{mvar|A}}. For an arbitrary matrix, we may not have equality for any norm; a counterexample would be
<math display="block">A = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix},</math>
which has vanishing spectral radius. In any case, for any matrix norm, we have the [[Spectral radius#Gelfand's formula|spectral radius formula]]:
<math display="block">\lim_{r\to\infty}\|A^r\|^{1/r}=\rho(A). </math>

===Energy norms===

If the vector norms <math>\|\cdot\|_{\alpha}</math> and <math>\|\cdot\|_{\beta}</math> are given in terms of [[Norm_(mathematics)#Energy_norm|energy norms]] based on [[Symmetric_matrix|symmetric]] [[Definite_matrix|positive definite]] matrices <math>P</math> and <math>Q</math> respectively, the resulting operator norm is given as
<math display="block">
\|A\|_{P, Q} = \sup \{ \|Ax\|_Q : \|x\|_P \leq 1 \}.
</math>

Using the symmetric [[Square_root_of_a_matrix|matrix square roots]] of <math>P</math> and <math>Q</math> respectively, the operator norm can be expressed as the spectral norm of a modified matrix:

<math display="block">
\|A\|_{P, Q} = \|Q^{1/2} A P^{-1/2}\|_{2}.
</math>

==Consistent and compatible norms==
A matrix norm <math>\| \cdot \|</math> on <math>K^{m \times n}</math> is called ''consistent'' with a vector norm <math>\| \cdot \|_{\alpha}</math> on <math>K^n</math> and a vector norm <math>\| \cdot \|_{\beta}</math> on <math>K^m</math>, if:
<math display="block">\left\|Ax\right\|_{\beta} \leq \left\|A\right\| \left\|x\right\|_{\alpha}</math>
for all <math>A \in K^{m \times n}</math> and all <math>x \in K^n</math>.  In the special case of {{math|1=''m'' = ''n''}} and <math>\alpha = \beta</math>, <math>\| \cdot \|</math> is also called ''compatible'' with <math>\|\cdot \|_{\alpha}</math>.

All induced norms are consistent by definition.  Also, any sub-multiplicative matrix norm on <math> K^{n \times n} </math> induces a compatible vector norm on <math>K^n</math> by defining <math> \left\| v \right\| := \left\| \left( v, v, \dots, v \right) \right\| </math>.

=="Entry-wise" matrix norms==
These norms treat an <math> m \times n </math> matrix as a vector of size <math> m \cdot n </math>, and use one of the familiar vector norms. For example, using the ''p''-norm for vectors, {{nowrap|''p'' ≥ 1}}, we get:

:<math>\| A \|_{p,p} = \| \mathrm{vec}(A) \|_p = \left( \sum_{i=1}^m \sum_{j=1}^n |a_{ij}|^p \right)^{1/p}</math>

This is a different norm from the induced ''p''-norm (see above) and the Schatten ''p''-norm (see below), but the notation is the same.

The special case ''p'' = 2 is the Frobenius norm, and ''p'' = &infin; yields the maximum norm.

==={{math|''L''<sub>2,1</sub>}} and {{math|''L<sub>p,q</sub>''}} norms===

Let <math>(a_1, \ldots, a_n) </math> be the dimension {{mvar|m}} columns of matrix <math>A</math>. From the original definition, the matrix <math> A </math> presents {{mvar|n}} data points in an {{mvar|m}}-dimensional space. The <math>L_{2,1}</math> norm<ref>{{cite conference | last1=Ding | first1=Chris | last2=Zhou | first2=Ding | last3=He | first3=Xiaofeng | last4=Zha | first4=Hongyuan  |date = June 2006 | title = R1-PCA: Rotational invariant L1-norm principal component analysis for robust subspace factorization | conference = 23rd International Conference on Machine Learning | series=ICML '06 | isbn = 1-59593-383-2 | place = Pittsburgh, PA | pages=281–288 | doi=10.1145/1143844.1143880 | publisher=[[Association for Computing Machinery]] }}</ref> is the sum of the Euclidean norms of the columns of the matrix:

:<math>\| A \|_{2,1} = \sum_{j=1}^n \| a_{j} \|_2 = \sum_{j=1}^n \left( \sum_{i=1}^m |a_{ij}|^2 \right)^{1/2}</math>

The <math>L_{2,1}</math> norm as an error function is more robust, since the error for each data point (a column) is not squared. It is used in [[robust data analysis]] and [[sparse coding]].

For {{nowrap|''p'', ''q'' ≥ 1}}, the <math>L_{2,1}</math> norm can be generalized to the <math>L_{p,q}</math> norm as follows:

:<math>\| A \|_{p,q} =  \left(\sum_{j=1}^n \left( \sum_{i=1}^m |a_{ij}|^p \right)^{\frac{q}{p}}\right)^{\frac{1}{q}}.</math>

===Frobenius norm===
{{Main|Hilbert–Schmidt operator}}
{{See also|Frobenius inner product}}

When {{nowrap|1=''p'' = ''q'' = 2}} for the <math>L_{p,q}</math> norm, it is called the '''Frobenius norm''' or the '''Hilbert–Schmidt norm''', though the latter term is used more frequently in the context of operators on (possibly infinite-dimensional) [[Hilbert space]]. This norm can be defined in various ways:

:<math>\|A\|_\text{F} = \sqrt{\sum_{i}^m\sum_{j}^n |a_{ij}|^2} = \sqrt{\operatorname{trace}\left(A^* A\right)} = \sqrt{\sum_{i=1}^{\min\{m, n\}} \sigma_i^2(A)},</math>

where the [[trace (matrix)|trace]] is the sum of diagonal entries, and <math>\sigma_i(A)</math> are the [[singular value]]s of <math>A</math>. The second equality is proven by explicit computation of <math>\mathrm{trace}(A^*A)</math>. The third equality is proven by [[singular value decomposition]] of <math>A</math>, and the fact that the trace is invariant under circular shifts.

The Frobenius norm is an extension of the Euclidean norm to <math>K^{n \times n}</math> and comes from the [[Frobenius inner product]] on the space of all matrices.

The Frobenius norm is sub-multiplicative and is very useful for [[numerical linear algebra]]. The sub-multiplicativity of Frobenius norm can be proved using the [[Cauchy–Schwarz inequality]]. In fact, it is more than sub-multiplicative, as <math display="block">\|AB\|_F \leq\|A\|_{op}\|B\|_F</math>where the operator norm <math>\|\cdot\|_{op} \leq \|\cdot\|_{F}</math>.

Frobenius norm is often easier to compute than induced norms, and has the useful property of being invariant under [[rotation matrix|rotations]] (and [[Unitary operator|unitary]] operations in general). That is, <math>\|A\|_\text{F} = \|AU\|_\text{F} = \|UA\|_\text{F}</math> for any unitary matrix <math>U</math>. This property follows from the cyclic nature of the trace (<math>\operatorname{trace}(XYZ) =\operatorname{trace}(YZX) = \operatorname{trace}(ZXY)</math>):

:<math>\|AU\|_\text{F}^2 = \operatorname{trace}\left( (AU)^{*}A U \right)
  = \operatorname{trace}\left( U^{*} A^{*}A U \right)
  = \operatorname{trace}\left( UU^{*} A^{*}A \right)
  = \operatorname{trace}\left( A^{*} A \right)
  = \|A\|_\text{F}^2,</math>

and analogously:

:<math>\|UA\|_\text{F}^2 = \operatorname{trace}\left( (UA)^{*}UA \right)
  = \operatorname{trace}\left( A^{*} U^{*} UA  \right)
  = \operatorname{trace}\left( A^{*}A \right)
  = \|A\|_\text{F}^2,</math>

where we have used the unitary nature of <math>U</math> (that is, <math>U^* U = U U^* = \mathbf{I}</math>).

It also satisfies

:<math>\|A^* A\|_\text{F} = \|AA^*\|_\text{F} \leq \|A\|_\text{F}^2</math>
and 
:<math>\|A + B\|_\text{F}^2 = \|A\|_\text{F}^2 + \|B\|_\text{F}^2 + 2 \operatorname{Re} \left( \langle A, B \rangle_\text{F} \right),</math>

where <math>\langle A, B \rangle_\text{F}</math> is the [[Frobenius inner product]], and Re is the real part of a complex number (irrelevant for real matrices)

===Max norm===
The '''max norm''' is the elementwise norm in the limit as {{nowrap|1=''p'' = ''q''}} goes to infinity:

:<math> \|A\|_{\max} = \max_{i, j} |a_{ij}|. </math>

This norm is not [[#Definition|sub-multiplicative]]; but modifying the right-hand side to <math>\sqrt{m n} \max_{i, j} \vert a_{i j} \vert</math> makes it so.

Note that in some literature (such as [[Communication complexity]]), an alternative definition of max-norm, also called the <math>\gamma_2</math>-norm, refers to the factorization norm:

:<math> \gamma_2(A) = \min_{U,V: A = UV^T} \| U \|_{2,\infty} \| V \|_{2,\infty} =  \min_{U,V: A = UV^T} \max_{i,j} \| U_{i,:} \|_2 \| V_{j,:} \|_2 </math>

==Schatten norms==
{{Further|Schatten norm}}

The Schatten ''p''-norms arise when applying the ''p''-norm to the vector of [[singular value decomposition|singular values]] of a matrix.<ref name=":1" /> If the singular values of the <math>m \times n</math> matrix <math>A</math> are denoted by ''&sigma;<sub>i</sub>'', then the Schatten ''p''-norm is defined by

:<math> \|A\|_p = \left( \sum_{i=1}^{\min\{m,n\}} \sigma_i^p(A) \right)^{1/p}.</math>

These norms again share the notation with the induced and entry-wise ''p''-norms, but they are different.

All Schatten norms are sub-multiplicative. They are also unitarily invariant, which means that <math>\|A\| = \|UAV\|</math> for all matrices <math>A</math> and all [[unitary matrix|unitary matrices]] <math>U</math> and <math>V</math>.

The most familiar cases are ''p'' = 1, 2, &infin;. The case ''p'' = 2 yields the Frobenius norm, introduced before. The case ''p'' =&nbsp;&infin; yields the spectral norm, which is the operator norm induced by the vector 2-norm (see above). Finally, ''p'' = 1 yields the '''nuclear norm''' (also known as the ''trace norm'', or the [[Singular Value Decomposition#Ky Fan norms|Ky Fan]] 'n'-norm<ref>{{Cite journal|last=Fan|first=Ky.|date=1951|title=Maximum properties and inequalities for the eigenvalues of completely continuous operators|journal=Proceedings of the National Academy of Sciences of the United States of America| volume=37|issue=11|pages=760–766|doi=10.1073/pnas.37.11.760|pmc=1063464|pmid=16578416|bibcode=1951PNAS...37..760F|doi-access=free}}</ref>), defined as:

: <math>\|A\|_{*} = \operatorname{trace} \left(\sqrt{A^*A}\right) = \sum_{i=1}^{\min\{m,n\}} \sigma_i(A),</math>

where <math>\sqrt{A^*A}</math> denotes a positive semidefinite matrix <math>B</math> such that <math>BB=A^*A</math>. More precisely, since <math>A^*A</math> is a [[positive semidefinite matrix]], its [[square root of a matrix|square root]] is well defined. The nuclear norm <math>\|A\|_{*}</math> is a [[convex envelope]] of the rank function <math>\text{rank}(A)</math>, so it is often used in [[mathematical optimization]] to search for low-rank matrices.

Combining [[von Neumann's trace inequality]] with [[Hölder's inequality]] for Euclidean space yields a version of [[Hölder's inequality]] for Schatten norms for <math> 1/p + 1/q = 1 </math>:

: <math>\left|\operatorname{trace}(A^*B)\right| \le \|A\|_p \|B\|_q,</math>

In particular, this implies the Schatten norm inequality

: <math> \|A\|_F^2 \le \|A\|_p \|A\|_q. </math>

==Monotone norms==
A matrix norm <math>\|\cdot \|</math> is called ''monotone'' if it is monotonic with respect to the [[Loewner order]]. Thus, a matrix norm is increasing if

:<math>A \preccurlyeq B \Rightarrow \|A\| \leq \|B\|.</math>

The Frobenius norm and spectral norm are examples of monotone norms.<ref>{{cite book |last1=Ciarlet |first1=Philippe G. |title=Introduction to numerical linear algebra and optimisation |date=1989 |publisher=Cambridge University Press |location=Cambridge, England |isbn=0521327881 |page=57}}</ref>

==Cut norms==
Another source of inspiration for matrix norms arises from considering a matrix as the [[adjacency matrix]] of a [[Weighted graph|weighted]], [[directed graph]].<ref name="FK">{{Cite journal|last1=Frieze| first1=Alan| last2=Kannan|first2=Ravi| date=1999-02-01|title=Quick Approximation to Matrices and Applications| url=https://doi.org/10.1007/s004930050052| journal=Combinatorica|language=en| volume=19 |issue=2 |pages=175–220 |doi=10.1007/s004930050052 |s2cid=15231198 |issn=1439-6912|url-access=subscription}}</ref>  The so-called "cut norm" measures how close the associated graph is to being [[bipartite graph|bipartite]]:
<math display="block">\|A\|_{\Box}=\max_{S\subseteq[n], T\subseteq[m]}{\left|\sum_{s\in S,t\in T}{A_{t,s}}\right|}</math> 
where {{math|''A'' &isin; ''K''<sup>''m''×''n''</sup>}}.<ref name="FK" /><ref name="LNGL">{{Cite book| last=Lovász László|title=Large Networks and Graph Limits |publisher=American Mathematical Society|year=2012| isbn=978-0-8218-9085-1 | series=AMS Colloquium Publications|volume=60| location=Providence, RI|pages=127–131 |chapter=The cut distance|author-link=László Lovász}}  Note that Lovász rescales {{math|‖''A''‖<sub>□</sub>}} to lie in {{closed-closed|0, 1}}.</ref><ref name="AN">{{Cite book|last1=Alon |first1=Noga |author-link=Noga Alon| last2=Naor| first2=Assaf|title=Proceedings of the thirty-sixth annual ACM symposium on Theory of computing |chapter=Approximating the cut-norm via Grothendieck's inequality | date=2004-06-13| chapter-url=https://doi.org/10.1145/1007352.1007371 | series=STOC '04 |location=Chicago, IL, USA | publisher=Association for Computing Machinery| pages=72–80| doi=10.1145/1007352.1007371 | isbn=978-1-58113-852-8 |s2cid=1667427}}</ref>  Equivalent definitions (up to a constant factor) impose the conditions {{math|2{{abs|''S''}} > ''n'' &amp; 2{{abs|''T''}} > ''m''}}; {{math|1=''S'' = ''T''}}; or {{math|1=''S'' &cap; ''T'' = &emptyset;}}.<ref name="LNGL" />

The cut-norm is equivalent to the induced operator norm {{math|‖·‖<sub>&infin;→1</sub>}}, which is itself equivalent to another norm, called the [[Grothendieck inequality|Grothendieck]] norm.<ref name="AN" />

To define the Grothendieck norm, first note that a linear operator {{Math|''K''<sup>1</sup> → ''K''<sup>1</sup>}} is just a scalar, and thus extends to a linear operator on any {{Math|''K<sup>k</sup>'' → ''K<sup>k</sup>''}}.  Moreover, given any choice of basis for {{Math|''K<sup>n</sup>''}} and {{Math|''K<sup>m</sup>''}}, any linear operator {{Math|''K<sup>n</sup>'' → ''K<sup>m</sup>''}} extends to a linear operator {{Math|(''K''<sup>''k''</sup>)<sup>''n''</sup> → (''K''<sup>''k''</sup>)<sup>''m''</sup>}}, by letting each matrix element on elements of {{Math|''K<sup>k</sup>''}} via scalar multiplication.  The Grothendieck norm is the norm of that extended operator; in symbols:<ref name="AN" />
<math display="block">\|A\|_{G,k}=\sup_{\text{each } u_j, v_j\in K^k; \|u_j\| = \|v_j\| = 1}{\sum_{j \in [n], \ell \in [m]}{(u_j\cdot v_j) A_{\ell,j}}}</math>

The Grothendieck norm depends on choice of basis (usually taken to be the [[standard basis]]) and {{mvar|k}}.

==Equivalence of norms==
{{See also|Equivalent norms}}

For any two matrix norms <math>\|\cdot\|_{\alpha}</math> and <math>\|\cdot\|_{\beta}</math>, we have that:

:<math>r\|A\|_\alpha\leq\|A\|_\beta\leq s\|A\|_\alpha</math>

for some positive numbers ''r'' and ''s'', for all matrices <math>A\in K^{m \times n}</math>. In other words, all norms on <math>K^{m \times n}</math> are ''equivalent''; they induce the same [[topology (structure)|topology]] on <math>K^{m \times n}</math>. This is true because the vector space <math>K^{m \times n}</math> has the finite [[dimension (mathematics)|dimension]] <math>m \times n</math>.

Moreover, for every matrix norm <math>\|\cdot\|</math> on <math>\R^{n\times n}</math> there exists a unique positive real number <math>k</math> such that <math>\ell\|\cdot\|</math> is a sub-multiplicative matrix norm for every <math>\ell \ge k</math>; to wit,
:<math>k = \sup\{\Vert A B \Vert \,:\, \Vert A \Vert \leq 1, \Vert B \Vert \leq 1\}. </math>

A sub-multiplicative matrix norm <math>\|\cdot\|_{\alpha}</math> is said to be ''minimal'', if there exists no other sub-multiplicative matrix norm <math>\|\cdot\|_{\beta}</math> satisfying <math>\|\cdot\|_{\beta} < \|\cdot\|_{\alpha}</math>.

===Examples of norm equivalence===
Let <math>\|A\|_p</math> once again refer to the norm induced by the vector ''p''-norm (as above in the Induced norm section).

For matrix <math>A\in\R^{m\times n}</math> of [[Rank (linear algebra)|rank]] <math>r</math>, the following inequalities hold:<ref>
[[Gene Golub|Golub, Gene]]; [[Charles Van Loan|Charles F. Van Loan]] (1996). Matrix Computations – Third Edition. Baltimore: The Johns Hopkins University Press, 56–57. {{ISBN|0-8018-5413-X}}.</ref><ref>Roger Horn and Charles Johnson. ''Matrix Analysis,'' Chapter 5, Cambridge University Press, 1985. {{ISBN|0-521-38632-2}}.</ref>

*<math>\|A\|_2\le\|A\|_F\le\sqrt{r}\|A\|_2</math>
*<math>\|A\|_F \le \|A\|_{*} \le \sqrt{r} \|A\|_F</math>
*<math>\|A\|_{\max} \le \|A\|_2 \le \sqrt{mn}\|A\|_{\max}</math>
*<math>\frac{1}{\sqrt{n}}\|A\|_\infty\le\|A\|_2\le\sqrt{m}\|A\|_\infty</math>
*<math>\frac{1}{\sqrt{m}}\|A\|_1\le\|A\|_2\le\sqrt{n}\|A\|_1.</math>

==See also==
* [[Dual norm]]
* [[Logarithmic norm]]

==Notes==
{{notelist|group=Note}}

==References==
{{reflist}}

==Bibliography==
* [[James W. Demmel]], Applied Numerical Linear Algebra, section 1.7, published by SIAM, 1997.
* Carl D. Meyer, Matrix Analysis and Applied Linear Algebra, published by SIAM, 2000. [http://www.matrixanalysis.com]
* [[John Watrous (computer scientist)|John Watrous]], Theory of Quantum Information, [https://web.archive.org/web/20160304053759/https://cs.uwaterloo.ca/~watrous/CS766/LectureNotes/02.pdf 2.3 Norms of operators], lecture notes, University of Waterloo, 2011.
* [[Kendall Atkinson]], An Introduction to Numerical Analysis, published by John Wiley & Sons, Inc 1989

{{Authority control}}

[[Category:Norms (mathematics)]]
[[Category:Linear algebra]]