Deep_learning
- Website
- Acknowledgments
- Notation
- Introduction
- I Applied Math and Machine Learning Basics
- Linear Algebra
- Scalars, Vectors, Matrices and Tensors
- Multiplying Matrices and Vectors
- Identity and Inverse Matrices
- Linear Dependence and Span
- Norms
- Special Kinds of Matrices and Vectors
- Eigendecomposition
- Singular Value Decomposition
- The Moore-Penrose Pseudoinverse
- The Trace Operator
- The Determinant
- Example: Principal Components Analysis
- Probability and Information Theory
- Why Probability?
- Random Variables
- Probability Distributions
- Marginal Probability
- Conditional Probability
- The Chain Rule of Conditional Probabilities
- Independence and Conditional Independence
- Expectation, Variance and Covariance
- Common Probability Distributions
- Useful Properties of Common Functions
- Bayes’ Rule
- Technical Details of Continuous Variables
- Information Theory
- Structured Probabilistic Models
- Numerical Computation
- Machine Learning Basics
- Learning Algorithms
- Capacity, Overfitting and Underfitting
- Hyperparameters and Validation Sets
- Estimators, Bias and Variance
- Maximum Likelihood Estimation
- Bayesian Statistics
- Supervised Learning Algorithms
- Unsupervised Learning Algorithms
- Stochastic Gradient Descent
- Building a Machine Learning Algorithm
- Challenges Motivating Deep Learning
- Linear Algebra
- II Deep Networks: Modern Practices
- Deep Feedforward Networks
- Regularization for Deep Learning
- Parameter Norm Penalties
- Norm Penalties as Constrained Optimization
- Regularization and Under-Constrained Problems
- Dataset Augmentation
- Noise Robustness
- Semi-Supervised Learning
- Multi-Task Learning
- Early Stopping
- Parameter Tying and Parameter Sharing
- Sparse Representations
- Bagging and Other Ensemble Methods
- Dropout
- Adversarial Training
- Tangent Distance, Tangent Prop, and Manifold Tangent Classifier
- Optimization for Training Deep Models
- Convolutional Networks
- The Convolution Operation
- Motivation
- Pooling
- Convolution and Pooling as an Infinitely Strong Prior
- Variants of the Basic Convolution Function
- Structured Outputs
- Data Types
- Efficient Convolution Algorithms
- Random or Unsupervised Features
- The Neuroscientific Basis for Convolutional Networks
- Convolutional Networks and the History of Deep Learning
- Sequence Modeling: Recurrent and Recursive Nets
- Unfolding Computational Graphs
- Recurrent Neural Networks
- Bidirectional RNNs
- Encoder-Decoder Sequence-to-Sequence Architectures
- Deep Recurrent Networks
- Recursive Neural Networks
- The Challenge of Long-Term Dependencies
- Echo State Networks
- Leaky Units and Other Strategies for Multiple Time Scales
- The Long Short-Term Memory and Other Gated RNNs
- Optimization for Long-Term Dependencies
- Explicit Memory
- Practical Methodology
- Applications
- III Deep Learning Research
- Linear Factor Models
- Autoencoders
- Representation Learning
- Structured Probabilistic Models for Deep Learning
- Monte Carlo Methods
- Confronting the Partition Function
- Approximate Inference
- Deep Generative Models
- Boltzmann Machines
- Restricted Boltzmann Machines
- Deep Belief Networks
- Deep Boltzmann Machines
- Boltzmann Machines for Real-Valued Data
- Convolutional Boltzmann Machines
- Boltzmann Machines for Structured or Sequential Outputs
- Other Boltzmann Machines
- Back-Propagation through Random Operations
- Directed Generative Nets
- Drawing Samples from Autoencoders
- Generative Stochastic Networks
- Other Generation Schemes
- Evaluating Generative Models
- Conclusion
- Bibliography
- Index
Website
Acknowledgments
Notation
This section provides a concise reference describing the notation used throughout this book. If you are unfamiliar with any of the corresponding mathematical concepts, we describe most of these ideas in chapters 2–4.
$a$ 小写、斜体、衬线 |
A scalar |
$\vec{a}$ 小写、斜体、衬线、上箭头 |
A vector |
$\mathbf{A}$ 斜体、粗体、衬线 |
A matrix |
$\mathsf{A}$ 斜体、粗体、衬线 |
A tensor |
$\mathbf{I}_{n}$ n斜体、粗体、衬线 |
Identity matrix with rows and columns n n |
$\vec{a}$ I斜体、粗体、衬线 |
Identity matrix with dimensionality implied by context e( ) i Standard basis vector [0,..., 0, 1,0,...,0] with a 1 at position i |
diag( ) a A square, diagonal matrix with diagonal entries given by a | |
$\vec{a}$ a斜体、粗体、衬线 |
A scalar random variable |
$\vec{a}$ a斜体、粗体、衬线 |
A vector-valued random variable |
$\vec{a}$ A斜体、粗体、衬线 |
A matrix-valued random variable |
A | A set |
R | The set of real numbers |
{0, 1} | The set containing 0 and 1 |
{0, 1, . . . , n} | The set of all integers between 0 and n |
[a, b] The real interval including a and b | |
(a, b] | The real interval excluding a but including b |
A\B | Set subtraction, i.e., the set containing the elements of A that are not in B |
G | A graph |
PaG(xi) | The parents of xi in G |
ai | Element i of vector a, with indexing starting at 1 |
a−i | All elements of vector a except for element i |
Ai,j | Element i, j of matrix A |
Ai,j Element i, j of matrix A Ai,: Row i of matrix A A:,i Column i of matrix A Ai,j,k Element (i, j, k) of a 3-D tensor A A:,:,i 2-D slice of a 3-D tensor ai Element i of the random vector a Linear Algebra Operations A Transpose of matrix A A+ Moore-Penrose pseudoinverse of A A B Element-wise (Hadamard) product of A and B det(A) Determinant of A