Deep Learning.pdf


Preview of PDF document deep-learning.pdf

Page 1 2 345802

Text preview


CONTENTS

3.2 Random Variables . . . . . . . . . . . . . .
3.3 Probabilit
Probability
y Distributions . . . . . . . . . . .
3.2 Marginal
Random V
ariables y. .. .. .. .. .. .. .. .. .. .. .. .. ..
3.4
Probabilit
Probability
3.3 Conditional
Probability Distributions
3.5
Probabilit
Probability
y .. .. .. .. .. .. .. .. .. .. ..
3.4
Marginal
Probabilit
y
.
. . . . .Probabilities
. . . . . . .
3.6 The Chain Rule of Conditional
3.5 Indep
Conditional
y . . . Indep
. . . .endence
. . . .
3.7
Independence
endenceProbabilit
and Conditional
Independence
3.6 Exp
The
Chain Rule
of Conditional
Probabilities
3.8
Expectation,
ectation,
Variance
and Co
Cov
variance
. . .
3.7
Indep
endence
and
Conditional
Indep
endence
3.9 Common Probabilit
Probability
y Distributions . . . . .
3.8 Useful
Expectation,
Variance
and CovFariance
3.10
Prop
Properties
erties
of Common
unctions . .. ..
3.9 Ba
Common
Probabilit
3.11
Bay
yes’ Rule
. . . . y. Distributions
. . . . . . . . .. .. .. .. ..
3.10
Useful
Prop
erties
of
Common
Functions
3.12 Technical Details of Con
Contin
tin
tinuous
uous
Variables . ..
3.11 Information
Bayes’ Rule Theory
. . . . .. .. .. .. .. .. .. .. .. .. .. .. .. ..
3.13
3.12
T
echnical
Details
of Contin
uous
3.14 Structured Probabilistic
Mo
Models
dels V
. ariables
. . . . . ..
3.13 Information Theory . . . . . . . . . . . . . .
3.14 Structured
Probabilistic Models . . . . . . .
4 Numerical
Computation
4.1 Ov
Overflo
erflo
erflow
w and Underflo
Underflow
w . . . . . . . . . . .
4 4.2
Numerical
Computation
Poor Conditioning . . . . . . . . . . . . . .
4.1 Gradien
Ov
erflowt-Based
and Underflo
w . . . .. .. .. .. .. .. .. ..
4.3
Gradient-Based
Optimization
4.2 Constrained
Poor Conditioning
. . . . .. .. .. .. .. .. .. .. .. ..
4.4
Optimization
4.3
Gradien
t-Based
Optimization
4.5 Example: Linear Least Squares. .. .. .. .. .. .. ..
4.4 Constrained Optimization . . . . . . . . . .
4.5
Example:
Linear
Least Squares . . . . . . .
5 Mac
Machine
hine
Learning
Basics
5.1 Learning Algorithms . . . . . . . . . . . . .
5 5.2
Machine
Learning
Basicsand Underfitting . . .
Capacit
Capacity
y, Overfitting
5.1 Hyp
Learning
Algorithms
. alidation
. . . . . .Sets
. . .. .. .. ..
5.3
Hyperparameters
erparameters
and V
5.2
Capacit
y
,
Overfitting
and
Underfitting
5.4 Estimators, Bias and Variance . . . . . .. .. ..
5.3 Maxim
Hyp
erparameters
and
Validation Sets
5.5
Maximum
um Lik
Likeliho
eliho
elihoo
od Estimation
. . .. .. .. ..
5.4 Ba
Estimators,
Bias and. V. ariance
5.6
Bay
yesian Statistics
. . . . .. .. .. .. .. .. .. ..
5.5
Maxim
um
Lik
eliho
o
d
Estimation
5.7 Sup
Supervised
ervised Learning Algorithms . .. .. .. .. .. ..
5.6 Unsup
Ba
yesian
Statistics
. . Algorithms
. . . . . . . .. .. .. .. ..
5.8
Unsupervised
ervised
Learning
5.7
Sup
ervised
Learning
Algorithms
5.9 Sto
Stocchastic Gradien
Gradientt Descen
Descentt . . .. .. .. .. .. .. ..
5.8 Building
Unsupervised
Learning
Algorithms
. . . .. ..
5.10
a Machine
Learning
Algorithm
5.9 Challenges
Stochastic Gradien
t Descen
. . . . . .. .. .. ..
5.11
Motiv
Motivating
ating
Deept Learning
5.10 Building a Machine Learning Algorithm . .
5.11 Challenges Motivating Deep Learning . . . .
II Deep Net
Netw
works: Mo
Modern
dern Practices
II Deep
Deep FNet
works:
dern
Practices
6
eedforw
eedforward
ardMo
Netw
Networks
orks
6.1 Example: Learning XOR . . . . . .
6 6.2
Deep Gradien
Feedforw
ard Netw
orks. . . . . .
Gradient-Based
t-Based
Learning
6.1 Example: Learning XOR . . . . . .
6.2 Gradient-Based Learning . . ii. . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
..
..
..
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.

.
.
.
.

.
.
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
..
..
..
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.

.
.
.
.

.
.
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
..
..
..
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.

.
.
.
.

.
.
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
..
..
..
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.

.
.
.
.

.
.
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
..
..
..
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.

.
.
.
.

.
.
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
..
..
..
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.

.
.
.
.

.
.
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
..
..
..
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.

.
.
.
.

.
.
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
..
..
..
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.

.
.
.
.

.
.
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
..
..
..
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.

.
.
.
.

.
.
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
..
..
..
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.

.
.
.
.

.
.
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
..
..
..
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.

.
.
.
.

.
.
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
..
..
..
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.

.
.
.
.

56
56
56
58
56
59
58
59
59
60
59
60
60
62
60
67
62
70
67
71
70
72
71
75
72
75
80
80
80
82
80
82
82
93
82
96
93
96
98
99
98
110
99
120
110
122
120
131
122
135
131
139
135
145
139
150
145
152
150
154
152
154
165
165
167
170
167
176
170
176