mmf或者叫nnmf原理

NMF is an alternative approach to decomposition that assumes that the data and the components are non-negative. NMF can be plugged in instead of PCA or its variants, in the cases where the data matrix does not contain negative values. It finds a decomposition of samples $X$ into two matrices $W$ and $H$ of non-negative elements, by optimizing the distance $d$ between $X$ and the matrix product $WH$ . The most widely used distance function is the squared Frobenius norm, which is an obvious extension of the Euclidean norm to matrices:假设数据和组件是非负的，那应该使用NMF分解方法。在数据矩阵不包含负值的情况下，该采用NMF而不是PCA和PCA的其他变体。NMF通过优化 $X$ 与 $WH$ 之间的距离 $d$ ，将 $X$ 分解为不含负值的 $W$ 和不含负值的 $H$ 。最广泛使用的距离函数是平方Frobenius范数，其是欧式范数到矩阵的明显扩展： $$d_{\mathrm{Fro}}(X, Y) = \frac{1}{2} ||X - Y||_{\mathrm{Fro}}^2 = \frac{1}{2} \sum_{i,j} (X_{ij} - {Y}_{ij})^2$$ Unlike PCA, the representation of a vector is obtained in an additive fashion, by superimposing the components, without subtracting. Such additive models are efficient for representing images and text.与主成分分析不同的是，矢量的表示是通过叠加分量而不是减法得到的。这种加性模型对于表示图像和文本是有效的。

In NMF, L1 and L2 priors can be added to the loss function in order to regularize the model. The L2 prior uses the Frobenius norm, while the L1 prior uses an elementwise L1 norm. As in ElasticNet, we control the combination of L1 and L2 with the l1_ratio ( $\rho$ ) parameter, and the intensity of the regularization with the alpha ( $\alpha$ ) parameter. Then the priors terms are:在NMF中，可以在损失函数中加入预先的L1和L2，从而对模型进行正则化。预先的L2使用Frobenius范数，而预先的L1使用按元素的L1范数。就像在松紧带回归，我们通过L1比值( $\rho$ )参数来控制L1和L2的组合，以及通过alpha( $\alpha$ )参数来控制正则化强度。预先项如下： $$\alpha \rho ||W||_1 + \alpha \rho ||H||_1 + \frac{\alpha(1-\rho)}{2} ||W||_{\mathrm{Fro}} ^ 2 + \frac{\alpha(1-\rho)}{2} ||H||_{\mathrm{Fro}} ^ 2$$ and the regularized objective function is:正则化后的目标函数为： $$d_{\mathrm{Fro}}(X, WH) + \alpha \rho ||W||_1 + \alpha \rho ||H||_1 + \frac{\alpha(1-\rho)}{2} ||W||_{\mathrm{Fro}} ^ 2 + \frac{\alpha(1-\rho)}{2} ||H||_{\mathrm{Fro}} ^ 2$$

由上式，惩罚项让W和H至少有一个全部值非负。这与机器学习算法一书漏掉了所有惩罚项。与pca的异同：去掉惩罚项其实就是普通pca。

矩阵Frobenius范数不同于矩阵2范数，具体阅读范数博文。

NMF regularizes both W and H. The public function non_negative_factorization allows a finer control through the regularization attribute, and may regularize only W, only H, or both.通过设置，NMF可正则化W和H之一或同时正则化这两者。

Other distance functions can be used in NMF as, for example, the (generalized) Kullback-Leibler (KL) divergence, also referred as I-divergence:NMF中还可以使用其他距离函数，例如(广义)Kullback-Leibler (KL)散度，也称为I-散度： $$d_{KL}(X, Y) = \sum_{i,j} (X_{ij} \log(\frac{X_{ij}}{Y_{ij}}) - X_{ij} + Y_{ij})$$ Or, the Itakura-Saito (IS) divergence:或者Itakura-Saito (IS)散度： $$d_{IS}(X, Y) = \sum_{i,j} (\frac{X_{ij}}{Y_{ij}} - \log(\frac{X_{ij}}{Y_{ij}}) - 1)$$ These three distances are special cases of the beta-divergence family, with $\beta = 2, 1, 0$ respectively. The beta-divergence are defined by :平方Frobenius范数、Kullback-Leibler (KL)散度、Itakura-Saito (IS)散度是beta-散度簇的特例，对应 $\beta = 2, 1, 0$ 。beta-散度定义如下： $$d_{\beta}(X, Y) = \sum_{i,j} \frac{1}{\beta(\beta - 1)}(X_{ij}^\beta + (\beta-1)Y_{ij}^\beta - \beta X_{ij} Y_{ij}^{\beta - 1})$$

$\beta$ 可取任意实数。

Note that this definition is not valid if $\beta \in \{0,1\}$ , yet it can be continuously extended to the definitions of $d_{IS}$ and $d_{KL}$ respectively.注意如果 $\beta \in \{0,1\}$ 定义非法，但是，它可以继续扩展到 $d_{IS}$ 和 $d_{KL}$ 的定义。

以上笔记参考自sklearn文档