Title: | Non-Negative and Sparse PCA |
---|---|
Description: | Two methods for performing a constrained principal component analysis (PCA), where non-negativity and/or sparsity constraints are enforced on the principal axes (PAs). The function 'nsprcomp' computes one principal component (PC) after the other. Each PA is optimized such that the corresponding PC has maximum additional variance not explained by the previous components. In contrast, the function 'nscumcomp' jointly computes all PCs such that the cumulative variance is maximal. Both functions have the same interface as the 'prcomp' function from the 'stats' package (plus some extra parameters), and both return the result of the analysis as an object of class 'nsprcomp', which inherits from 'prcomp'. See <https://sigg-iten.ch/learningbits/2013/05/27/nsprcomp-is-on-cran/> and Sigg et al. (2008) <doi:10.1145/1390156.1390277> for more details. |
Authors: | Christian Sigg [aut, cre] |
Maintainer: | Christian Sigg <[email protected]> |
License: | GPL (>=2) |
Version: | 0.5.1-2 |
Built: | 2025-02-01 04:58:01 UTC |
Source: | https://github.com/chrsigg/nsprcomp |
asdev
computes the additional standard deviation explained by each
principal component, taking into account the possible non-orthogonality of
the pseudo-rotation matrix .
asdev(x, w, center = TRUE, scale. = FALSE)
asdev(x, w, center = TRUE, scale. = FALSE)
x |
a numeric data matrix with the observations as rows |
w |
a numeric data matrix with the principal axes as columns |
center |
a logical value indicating whether the empirical mean of
|
scale. |
a logical value indicating whether the columns of |
The additional standard deviation of a component is measured after projecting
the corresponding principal axis to the ortho-complement space spanned by the
previous principal axes. This procedure ensures that the variance explained
by non-orthogonal principal axes is not counted multiple times. If the
principal axes are pairwise orthogonal (e.g. computed using standard PCA),
the additional standard deviations are identical to the standard deviations
of the columns of the scores matrix .
asdev
is also useful to build a partial PCA model from
, to be completed with additional components computed
using
nsprcomp
.
asdev
returns a list with class (nsprcomp, prcomp)
containing the following elements:
sdev |
the additional standard deviation explained by each component |
rotation |
copied from the
input argument |
x |
the scores matrix |
center , scale.
|
the centering and scaling used |
xp |
the deflated data matrix corresponding to |
q |
an orthonormal basis for the principal subspace |
The PCA terminology is not consistent across the literature. Given a
zero mean data matrix (with observations as rows) and a
basis
of the principal subspace, we define the scores
matrix as
which contains the principal
components as its columns. The columns of the pseudo-rotation matrix
are called the principal axes, and the elements of
are called the loadings.
Mackey, L. (2009) Deflation Methods for Sparse PCA. In Advances in Neural Information Processing Systems (pp. 1017–1024).
Computes the cardinality (the sum of non-zero elements) of each column of
the matrix .
cardinality(w)
cardinality(w)
w |
a numeric matrix, e.g. the rotation matrix of a sparse PCA analysis |
Performs a PCA-like analysis on the given data matrix, where
non-negativity and/or sparsity constraints are enforced on the principal axes
(PAs). In contrast to regular PCA, which greedily maximises the variance of
each principal component (PC), nscumcomp
jointly optimizes the
components such that the cumulative variance of all PCs is maximal.
nscumcomp(x, ...) ## Default S3 method: nscumcomp(x, ncomp = min(dim(x)), omega = rep(1, nrow(x)), k = d * ncomp, nneg = FALSE, gamma = 0, center = TRUE, scale. = FALSE, nrestart = 5, em_tol = 0.001, em_maxiter = 20, verbosity = 0, ...) ## S3 method for class 'formula' nscumcomp(formula, data = NULL, subset, na.action, ...)
nscumcomp(x, ...) ## Default S3 method: nscumcomp(x, ncomp = min(dim(x)), omega = rep(1, nrow(x)), k = d * ncomp, nneg = FALSE, gamma = 0, center = TRUE, scale. = FALSE, nrestart = 5, em_tol = 0.001, em_maxiter = 20, verbosity = 0, ...) ## S3 method for class 'formula' nscumcomp(formula, data = NULL, subset, na.action, ...)
x |
a numeric matrix or data frame which provides the data for the analysis. |
... |
arguments passed to or from other methods. |
ncomp |
the number of principal components (PCs) to be computed. The
default is to compute a full basis for |
omega |
a vector with as many entries as there are data samples, to perform weighted PCA (analogous to weighted least-squares regression). The default is an equal weighting of all samples. |
k |
an upper bound on the total number of non-zero loadings of the
pseudo-rotation matrix |
nneg |
a logical value indicating whether the loadings should be non-negative, i.e. the PAs should be constrained to the non-negative orthant. |
gamma |
a non-negative penalty on the divergence from orthonormality of the pseudo-rotation matrix. The default is not to penalize, but a positive value is sometimes necessary to avoid PAs collapsing onto each other. |
center |
a logical value indicating whether the empirical mean of (the
columns of) |
scale. |
a logical value indicating whether the columns of |
nrestart |
the number of random restarts for computing the pseudo-rotation matrix via expectation-maximization (EM) iterations. The solution achieving the minimum of the objective function over all random restarts is kept. A value greater than one can help to avoid poor local minima. |
em_tol |
If the relative change of the objective is less than
|
em_maxiter |
the maximum number of EM iterations to be performed. The EM
procedure is terminated if either the |
verbosity |
an integer specifying the verbosity level. Greater values result in more output, the default is to be quiet. |
formula |
a formula with no response variable, referring only to numeric variables. |
data |
an optional data frame (or similar: see
|
subset |
an optional vector used to select rows (observations) of the
data matrix |
na.action |
a function which indicates what should happen
when the data contain |
nscumcomp
computes all PCs jointly using expectation-maximization (EM)
iterations. The M-step is equivalent to minimizing the objective function
w.r.t. the pseudo-rotation matrix , where
is the scores matrix modified to account for the non-orthogonality of
,
is the identity matrix and
gamma
is the Lagrange parameter associated with the ortho-normality
penalty on . Non-negativity of the loadings is achieved by
enforcing a zero lower bound in the L-BFGS-B algorithm used for the
minimization of the objective, and sparsity is achieved by a subsequent soft
thresholding of
.
nscumcomp
returns a list with class (nsprcomp, prcomp)
containing the following elements:
sdev |
the additional standard
deviation explained by each component, see |
rotation |
the matrix of non-negative and/or sparse loadings, containing the principal axes as columns. |
x |
the scores matrix
|
center , scale
|
the
centering and scaling used, or |
xp |
the deflated data
matrix corresponding to |
q |
an orthonormal basis for the principal subspace |
The components are returned in order of decreasing variance for convenience.
The PCA terminology is not consistent across the literature. Given a
zero mean data matrix (with observations as rows) and a
basis
of the principal subspace, we define the scores
matrix as
which contains the principal
components as its columns. The columns of the pseudo-rotation matrix
are called the principal axes, and the elements of
are called the loadings.
if (requireNamespace("MASS", quietly = TRUE)) withAutoprint({ set.seed(1) # Regular PCA, with tolerance set to return five PCs pca <- prcomp(MASS::Boston, tol = 0.35, scale. = TRUE) cumsum(pca$sdev[1:5]) # Sparse cumulative PCA with five components and a total of 20 non-zero loadings. # The orthonormality penalty is set to a value which avoids co-linear principal # axes. Note that the non-zero loadings are not distributed uniformly over # the components. scc <- nscumcomp(MASS::Boston, ncomp = 5, k = 20, gamma = 1e4, scale. = TRUE) cumsum(scc$sdev) cardinality(scc$rotation) # Non-negative sparse cumulative PCA nscumcomp(MASS::Boston, ncomp = 5, nneg = TRUE, k = 20, gamma = 1e4, scale. = TRUE) })
if (requireNamespace("MASS", quietly = TRUE)) withAutoprint({ set.seed(1) # Regular PCA, with tolerance set to return five PCs pca <- prcomp(MASS::Boston, tol = 0.35, scale. = TRUE) cumsum(pca$sdev[1:5]) # Sparse cumulative PCA with five components and a total of 20 non-zero loadings. # The orthonormality penalty is set to a value which avoids co-linear principal # axes. Note that the non-zero loadings are not distributed uniformly over # the components. scc <- nscumcomp(MASS::Boston, ncomp = 5, k = 20, gamma = 1e4, scale. = TRUE) cumsum(scc$sdev) cardinality(scc$rotation) # Non-negative sparse cumulative PCA nscumcomp(MASS::Boston, ncomp = 5, nneg = TRUE, k = 20, gamma = 1e4, scale. = TRUE) })
Performs a constrained principal component analysis,
where non-negativity and/or sparsity constraints are enforced on the principal axes.
The result is returned as an object of class nsprcomp
, which inherits from
prcomp
.
nsprcomp(x, ...) ## Default S3 method: nsprcomp(x, retx = TRUE, ncomp = min(dim(x)), omega = rep(1, nrow(x)), k = ncol(x), nneg = FALSE, center = TRUE, scale. = FALSE, tol = NULL, nrestart = 5, em_tol = 0.001, em_maxiter = 100, partial_model = NULL, verbosity = 0, ...) ## S3 method for class 'formula' nsprcomp(formula, data = NULL, subset, na.action, ...)
nsprcomp(x, ...) ## Default S3 method: nsprcomp(x, retx = TRUE, ncomp = min(dim(x)), omega = rep(1, nrow(x)), k = ncol(x), nneg = FALSE, center = TRUE, scale. = FALSE, tol = NULL, nrestart = 5, em_tol = 0.001, em_maxiter = 100, partial_model = NULL, verbosity = 0, ...) ## S3 method for class 'formula' nsprcomp(formula, data = NULL, subset, na.action, ...)
x |
a numeric matrix or data frame which provides the data for the principal component analysis. |
... |
arguments passed to or from other methods. |
retx |
a logical value indicating whether the principal components, i.e.
|
ncomp |
the number of principal components (PCs) to be computed. With
the default setting, PCs are computed until |
omega |
a vector with as many entries as there are data samples, to perform weighted PCA (analogous to weighted least-squares regression). The default is an equal weighting of all samples. |
k |
either a scalar or a vector of length |
nneg |
a logical value indicating whether the loadings should be non-negative, i.e. the PAs should be constrained to the non-negative orthant. |
center |
a logical value indicating whether the empirical mean of (the
columns) of |
scale. |
a logical value indicating whether the columns of |
tol |
a threshold indicating the magnitude below which components should
be omitted. Components are omitted if their standard deviations are less
than or equal to |
nrestart |
the number of random restarts for computing the principal component via expectation-maximization (EM) iterations. The solution achieving maximum standard deviation over all random restarts is kept. A value greater than one can help to avoid poor local maxima. |
em_tol |
If the relative change of the objective is less than
|
em_maxiter |
the maximum number of EM iterations to be performed. The EM
procedure is terminated if either the |
partial_model |
|
verbosity |
an integer specifying the verbosity level. Greater values result in more output, the default is to be quiet. |
formula |
a formula with no response variable, referring only to numeric variables. |
data |
an optional data frame (or similar: see
|
subset |
an optional vector used to select rows (observations) of the
data matrix |
na.action |
a function which indicates what should happen
when the data contain |
nsprcomp
computes a principal component (PC) using expectation-maximization
iterations, where non-negativity of the loadings is achieved by projecting
the principal axis (PA) into the non-negative orthant, and sparsity of the
loadings is achieved by soft thresholding (Sigg and Buhmann, 2008).
Because constrained principal axes no longer correspond to true eigenvectors of the covariance matrix and are usually not pairwise orthogonal, special attention needs to be paid when computing more than a single PC. The algorithm implements the generalized deflation method proposed by Mackey (2009) to maximize the additional variance of each component. Given a basis of the space spanned by the previous PAs, the variance of the PC is maximized after projecting the current PA to the ortho-complement space of the basis. This procedure maximizes the additional variance not explained by previous components, and is identical to standard PCA if no sparsity or non-negativity constraints are enforced on the PAs.
See the references for further details.
nsprcomp
returns a list with class (nsprcomp, prcomp)
containing the following elements:
sdev |
the additional standard
deviation explained by each component, see |
rotation |
the matrix of non-negative and/or sparse loadings, containing the principal axes as columns. |
x |
the scores matrix
|
center , scale
|
the centering and
scaling used, or |
xp |
the deflated data matrix
corresponding to |
q |
an orthonormal basis for the principal subspace |
The PCA terminology is not consistent across the literature. Given a
zero mean data matrix (with observations as rows) and a
basis
of the principal subspace, we define the scores
matrix as
which contains the principal
components as its columns. The columns of the pseudo-rotation matrix
are called the principal axes, and the elements of
are called the loadings.
Deflating the data matrix accumulates numerical errors over successive PCs.
Sigg, C. D. and Buhmann, J. M. (2008) Expectation-Maximization for Sparse and Non-Negative PCA. In Proceedings of the 25th International Conference on Machine Learning (pp. 960–967).
Mackey, L. (2009) Deflation Methods for Sparse PCA. In Advances in Neural Information Processing Systems (pp. 1017–1024).
if (requireNamespace("MASS", quietly = TRUE)) withAutoprint({ set.seed(1) # Regular PCA, with the tolerance set to return five PCs prcomp(MASS::Boston, tol = 0.36, scale. = TRUE) # Sparse PCA with different cardinalities per component. The number of components # is derived from the length of vector k. nsprcomp(MASS::Boston, k = c(13, 7, 5, 5, 5), scale. = TRUE) # Non-negative sparse PCA with four components. Note that the principal axes # naturally have a high degree of orthogonality, because each component # maximizes the additional variance not already explained. set.seed(1) nsprcomp(MASS::Boston, k = c(7, 5, 2, 2), nneg = TRUE, scale. = TRUE) # The optimization can get stuck in local optima. Increase the number of # random restarts or the number of power iterations to likely obtain decreasing # standard deviations. set.seed(1) (nspc <- nsprcomp(MASS::Boston, k = c(7, 5, 2, 2), nneg = TRUE, scale. = TRUE, nrestart = 10, em_tol = 1e-4, verbosity = 1)) # continue the computation of components from a partial model nsprcomp(MASS::Boston, k = 3, ncomp = 5, nneg = TRUE, scale. = TRUE, partial_model = nspc) # The reconstruction error for each sample can be influenced using the # weighting vector omega. To reconstruct the data, the generalized # inverse of the pseudo-rotation matrix has to be used, because the constrained # principal axes are in general not pairwise orthogonal. set.seed(1) X <- matrix(runif(5*10), 5) nspc <- nsprcomp(X, omega = c(5, 1, 1, 1, 5), ncomp = 2, nneg = TRUE) X_hat <- predict(nspc)%*%MASS::ginv(nspc$rotation) + matrix(1,5,1)%*%nspc$center rowSums((X - X_hat)^2) })
if (requireNamespace("MASS", quietly = TRUE)) withAutoprint({ set.seed(1) # Regular PCA, with the tolerance set to return five PCs prcomp(MASS::Boston, tol = 0.36, scale. = TRUE) # Sparse PCA with different cardinalities per component. The number of components # is derived from the length of vector k. nsprcomp(MASS::Boston, k = c(13, 7, 5, 5, 5), scale. = TRUE) # Non-negative sparse PCA with four components. Note that the principal axes # naturally have a high degree of orthogonality, because each component # maximizes the additional variance not already explained. set.seed(1) nsprcomp(MASS::Boston, k = c(7, 5, 2, 2), nneg = TRUE, scale. = TRUE) # The optimization can get stuck in local optima. Increase the number of # random restarts or the number of power iterations to likely obtain decreasing # standard deviations. set.seed(1) (nspc <- nsprcomp(MASS::Boston, k = c(7, 5, 2, 2), nneg = TRUE, scale. = TRUE, nrestart = 10, em_tol = 1e-4, verbosity = 1)) # continue the computation of components from a partial model nsprcomp(MASS::Boston, k = 3, ncomp = 5, nneg = TRUE, scale. = TRUE, partial_model = nspc) # The reconstruction error for each sample can be influenced using the # weighting vector omega. To reconstruct the data, the generalized # inverse of the pseudo-rotation matrix has to be used, because the constrained # principal axes are in general not pairwise orthogonal. set.seed(1) X <- matrix(runif(5*10), 5) nspc <- nsprcomp(X, omega = c(5, 1, 1, 1, 5), ncomp = 2, nneg = TRUE) X_hat <- predict(nspc)%*%MASS::ginv(nspc$rotation) + matrix(1,5,1)%*%nspc$center rowSums((X - X_hat)^2) })
peav
computes the percentage of the explained _additional_ variance of each
principal component, taking into account the possible non-orthogonality of
the pseudo-rotation matrix .
peav(x, w, center = TRUE, scale. = FALSE)
peav(x, w, center = TRUE, scale. = FALSE)
x |
a numeric data matrix with the observations as rows |
w |
a numeric data matrix with the principal axes as columns |
center |
a logical value indicating whether the empirical mean of
|
scale. |
a logical value indicating whether the columns of |
The explained additional variance is computed using asdev
and
divided by the total variance of the data to obtain percentages.
sum(peav(x, w))
is equal to one if is an orthonormal
basis, e.g. the rotation matrix of a standard PCA.
peav
is useful to compare the solutions of various constrained PCA
methods w.r.t. standard PCA.
The method produces different results than the "percentage explained
variance" (pev
) computed by the spca
function from the
elasticnet
package.