堆叠stacking

在集成学习（Ensemble Learning）中除了Bagging和Boosting对数据的横向划分划分之外，还有一个纵向划分（加深）的方法，一般称为Stacked Generalization（SG）的技术。

SG是怎么诞生的？最早重视并提出Stacking技术的是David H. Wolpert，他在1992年发表了SG技术论文：这好比是巧妙的扩展了交叉验证（cross-validation），通过胜者全得（winner-takes-all）的方式来集成的方法。 Wolpert大神是一个三栖学者，数学家，物理学家，和计算学家。他更为成名的是1995年提出No-Free-Lunch（NFL）理论。 NFL理论很直观，就是算法差异更多在于适不适合你要解决的问题。比较多个算法，例如，问题P1和机器学习M1合适，但是不可能合适所有的问题。

对stacking还没有清晰概念，由维基百科，其23-29为关于stacking的文章，似乎25，也就是2013年的文章是关于stacking最近的文章，那么可能需要较多关注这篇文章。

Stacking

Stacking (sometimes called stacked generalization) involves training a learning algorithm to combine the predictions of several other learning algorithms. First, all of the other algorithms are trained using the available data, then a combiner algorithm is trained to make a final prediction using all the predictions of the other algorithms as additional inputs. If an arbitrary combiner algorithm is used, then stacking can theoretically represent any of the ensemble techniques described in this article, although, in practice, a logistic regression model is often used as the combiner.堆叠（有时也叫堆叠泛化）涉及训练一种学习算法来结合其他几种学习算法的预测。首先，利用现有的数据对所有其他算法进行训练，然后将所有其他算法的预测作为附加输入，训练组合算法进行最终预测。如果使用任意组合器算法，那么堆栈理论上可以表示本文（wiki-ensemble）描述的任何集成技术，尽管在实践中经常使用逻辑回归模型作为组合器。

Stacking typically yields performance better than any single one of the trained models.[23] It has been successfully used on both supervised learning tasks (regression,[24] classification and distance learning [25]) and unsupervised learning (density estimation).[26] It has also been used to estimate bagging's error rate.[3][27] It has been reported to out-perform Bayesian model-averaging.[28] The two top-performers in the Netflix competition utilized blending, which may be considered to be a form of stacking.[29]堆叠通常比任何一个训练过的模型产生更好的性能。它已经成功地应用于监督学习任务(回归、分类和远程学习)和非监督学习(密度估计)。它也被用来估计装袋的错误率。据报道，它的性能优于贝叶斯模型平均。在Netflix的竞赛中，表现最好的两个使用了混合技术，这可以被认为是一种叠加的形式。

参考：