Editing Independent component analysis (section)

== Mathematical definitions ==

Linear independent component analysis can be divided into noiseless and noisy cases, where noiseless ICA is a special case of noisy ICA. Nonlinear ICA should be considered as a separate case.

=== General Derivation ===

In the classical ICA model, it is assumed that the observed data <math>\mathbf{x}_i \in \mathbb{R}^m</math> at time <math>t_i</math> is generated from source signals <math>\mathbf{s}_i \in \mathbb{R}^m</math> via a linear transformation <math>\mathbf{x}_i = A \mathbf{s}_i</math>, where <math>A</math> is an unknown, invertible mixing matrix. To recover the source signals, the data is first centered (zero mean), and then whitened so that the transformed data has unit covariance. This whitening reduces the problem from estimating a general matrix <math>A</math> to estimating an orthogonal matrix <math>V</math>, significantly simplifying the search for independent components.

If the covariance matrix of the centered data is <math>\Sigma_x = A A^\top</math>, then using the eigen-decomposition <math>\Sigma_x = Q D Q^\top</math>, the whitening transformation can be taken as <math>D^{-1/2} Q^\top</math>. This step ensures that the recovered sources are uncorrelated and of unit variance, leaving only the task of rotating the whitened data to maximize statistical independence. This general derivation underlies many ICA algorithms and is foundational in understanding the ICA model.<ref>{{Cite book |last=Holmes |first=Mark |title=Introduction to Scientific Computing and Data Analysis |edition=2nd |year=2023 |publisher=Springer |isbn=978-3-031-22429-4}}</ref>

==== Reduced Mixing Problem ====
'''Independent component analysis''' ('''ICA''') addresses the problem of recovering a set of unobserved source signals <math>s_i = (s_{i1}, s_{i2}, \dots, s_{im})^T</math> from observed mixed signals <math>x_i = (x_{i1}, x_{i2}, \dots, x_{im})^T</math>, based on the linear mixing model:

<math>x_i = A\,s_i,</math>

where the <math>A</math> is an <math>m \times m</math> invertible matrix called the '''mixing matrix''', <math>s_i</math> represents the m‑dimensional vector containing the values of the sources at time <math>t_i</math>, and <math>x_i</math> is the corresponding vector of observed values at time <math>t_i</math>. The goal is to estimate both <math>A</math> and the source signals <math>\{s_i\}</math> solely from the observed data <math>\{x_i\}</math>.

After centering, the Gram matrix is computed as:
<math>
(X^*)^T X^* = Q\,D\,Q^T,
</math>
where D is a diagonal matrix with positive entries (assuming <math>X^*</math> has maximum rank), and Q is an orthogonal matrix.<ref name="Springer"/>
Writing the SVD of the mixing matrix <math>A = U \Sigma V^T</math> and comparing with <math>AA^T = U \Sigma^2 U^T</math> the mixing A has the form 
<math>
A = Q\,D^{1/2}\,V^T.
</math>
So, the normalized source values satisfy
<math>s_i^* = V\,y_i^*</math>, where <math>y_i^* = D^{-\tfrac12}Q^T x_i^*.</math>
Thus, ICA reduces to finding the orthogonal matrix <math>V</math>. This matrix can be computed using optimization techniques via projection pursuit methods (see [[#Projection pursuit|Projection Pursuit]]).<ref name="Springer"/>

Well-known algorithms for ICA include [[infomax]], [[FastICA]], [[JADE (ICA)|JADE]], and [[kernel-independent component analysis]], among others. In general, ICA cannot identify the actual number of source signals, a uniquely correct ordering of the source signals, nor the proper scaling (including sign) of the source signals.

ICA is important to [[blind signal separation]] and has many practical applications. It is closely related to (or even a special case of) the search for a [[factorial code]] of the data, i.e., a new vector-valued representation of each data vector such that it gets uniquely encoded by the resulting code vector (loss-free coding), but the code components are statistically independent.

==== Linear noiseless ICA ====

The components <math>x_i</math> of the observed random vector <math>\boldsymbol{x}=(x_1,\ldots,x_m)^T</math> are generated as a sum of the independent components <math>s_k</math>, <math>k=1,\ldots,n</math>:

<math>x_i = a_{i,1} s_1 + \cdots + a_{i,k} s_k + \cdots + a_{i,n} s_n</math>

weighted by the mixing weights <math>a_{i,k}</math>.

The same generative model can be written in vector form as <math>\boldsymbol{x}=\sum_{k=1}^{n} s_k \boldsymbol{a}_k</math>, where the observed random vector <math>\boldsymbol{x}</math> is represented by the basis vectors <math>\boldsymbol{a}_k=(\boldsymbol{a}_{1,k},\ldots,\boldsymbol{a}_{m,k})^T</math>. The basis vectors <math>\boldsymbol{a}_k</math> form the columns of the mixing matrix <math>\boldsymbol{A}=(\boldsymbol{a}_1,\ldots,\boldsymbol{a}_n)</math> and the generative formula can be written as <math>\boldsymbol{x}=\boldsymbol{A} \boldsymbol{s}</math>, where <math>\boldsymbol{s}=(s_1,\ldots,s_n)^T</math>.

Given the model and realizations (samples) <math>\boldsymbol{x}_1,\ldots,\boldsymbol{x}_N</math> of the random vector <math>\boldsymbol{x}</math>, the task is to estimate both the mixing matrix <math>\boldsymbol{A}</math> and the sources <math>\boldsymbol{s}</math>. This is done by adaptively calculating the <math>\boldsymbol{w}</math> vectors and setting up a cost function which either maximizes the non-gaussianity of the calculated <math>s_k = \boldsymbol{w}^T \boldsymbol{x}</math> or minimizes the mutual information. In some cases, a priori knowledge of the probability distributions of the sources can be used in the cost function.

The original sources <math>\boldsymbol{s}</math> can be recovered by multiplying the observed signals <math>\boldsymbol{x}</math> with the inverse of the mixing matrix <math>\boldsymbol{W}=\boldsymbol{A}^{-1}</math>, also known as the unmixing matrix. Here it is assumed that the mixing matrix is square (<math>n=m</math>). If the number of basis vectors is greater than the dimensionality of the observed vectors, <math>n>m</math>, the task is overcomplete but is still solvable with the [[pseudo inverse]].

==== Linear noisy ICA ====

With the added assumption of zero-mean and uncorrelated Gaussian noise <math>n\sim N(0,\operatorname{diag}(\Sigma))</math>, the ICA model takes the form <math>\boldsymbol{x}=\boldsymbol{A} \boldsymbol{s}+n</math>.

==== Nonlinear ICA ====

The mixing of the sources does not need to be linear. Using a nonlinear mixing function <math>f(\cdot|\theta)</math> with parameters <math>\theta</math> the [[nonlinear ICA]] model is <math>x=f(s|\theta)+n</math>.

=== Identifiability ===

The independent components are identifiable up to a permutation and scaling of the sources.<ref>Theorem 11, Comon, Pierre. "Independent component analysis, a new concept?." Signal processing 36.3 (1994): 287-314.</ref> This identifiability requires that:

* At most one of the sources <math>s_k</math> is Gaussian,
* The number of observed mixtures, <math>m</math>, must be at least as large as the number of estimated components <math>n</math>: <math>m \ge n</math>. It is equivalent to say that the mixing matrix <math>\boldsymbol{A}</math> must be of full [[rank (linear algebra)|rank]] for its inverse to exist.