稀疏pca原理

Note that there are many different formulations for the Sparse PCA problem. The one implemented here is based on Mrl09. The optimization problem solved is a PCA problem with an $\ell_1$ penalty on the components:注意稀疏主成分分析问题有许多不同的公式。这里应用的公式基于Mrl09。优化问题是一个带有 $\ell_1$ 惩罚组件的PCA问题： $$\begin{split}(W_{n \times k}^*, H_{k \times m}^*) = \underset{W_{n \times k}, H_{k \times m}}{\operatorname{arg\,min\,}} & \frac{1}{2} ||X_{n \times m}-W_{n \times k}H_{k \times m}||_2^2+\alpha||H_{k \times m}||_1 \\ \text{subject to } & ||W_i||_2 = 1 \text{ for all } 0 \leq i \leq k \leq m\end{split}$$

由上式，对H的l1范数惩罚项使得稀疏转化后的H稀疏，同时可以看到稀疏pca也是可以降维的。这与机器学习算法一书错误以为稀疏pca不可以降维。与pca的异同：去掉惩罚项其实就是普通pca。

矩阵Frobenius范数不同于矩阵2范数，具体阅读范数博文。

The sparsity-inducing $\ell_1$ norm also prevents learning components from noise when few training samples are available. The degree of penalization (and thus sparsity) can be adjusted through the hyperparameter alpha. Small Halves lead to a gently regularized factorization, while larger Halves shrink many coefficients to zero.当可用的训练样本很少时，引入稀疏的 $\ell_1$ 范数还可以防止学习组件产生噪声。惩罚的程度(也就是稀疏度)可以通过超参数alpha来调整。较小的值会导致温和的正则化因子分解，而较大的值会将许多系数缩小到零。

经稀疏PCA转换后的矩阵：

列向量线性无关；
维度减少；
稀疏。

以上笔记参考自sklearn文档