Deep Learning.pdf


Preview of PDF document deep-learning.pdf

Page 1 2 3 456802

Text preview


CONTENTS

7
7

8
8

9
9

6.3 Hidden Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
6.4 Arc
Architecture
hitecture Design . . . . . . . . . . . . . . . . . . . . . . . . . . 196
6.3 Bac
Hidden
Units . . and
. . . Other
. . . .Differen
. . . . tiation
. . . . Algorithms
. . . . . . . .. .. .. .. .. 203
190
6.5
Back-Propagation
k-Propagation
Differentiation
6.4 Historical
Architecture
Design
196
6.6
Notes
. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 224
6.5 Back-Propagation and Other Differentiation Algorithms . . . . . 203
6.6 Historical Notes
. . . Learning
. . . . . . . . . . . . . . . . . . . . . . . . . 228
224
Regularization
for Deep
7.1 Parameter Norm Penalties . . . . . . . . . . . . . . . . . . . . . . 230
Regularization
for Deep
LearningOptimization . . . . . . . . . . . . 228
7.2
Norm Penalties
as Constrained
237
7.1
P
arameter
Norm
P
enalties
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
230
7.3 Regularization and Under-Constrained Problems . . . . . . . . . 239
7.2 Dataset
Norm Penalties
as
Constrained
237
7.4
Augmen
Augmentation
tation
. . . . .Optimization
. . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. 240
7.3 Noise
Regularization
and. Under-Constrained
239
7.5
Robustness
. . . . . . . . . . . Problems
. . . . . . .. .. .. .. .. .. .. .. .. 242
7.4
Dataset
Augmen
tation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
240
7.6 Semi-Sup
Semi-Supervised
ervised Learning . . . . . . . . . . . . . . . . . . . . . . 244
7.5 Multi-T
Noise
Robustness
. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 245
242
7.7
Multi-Task
ask Learning
7.6
Semi-Sup
ervised
Learning
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
244
7.8 Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
7.7 P
Multi-T
ask TLearning
. arameter
. . . . . .Sharing
. . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. 245
7.9
arameter
ying and P
251
7.8
Early
Stopping
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
246
7.10 Sparse Represen
Representations
tations . . . . . . . . . . . . . . . . . . . . . . . . 253
7.9
P
arameter
T
ying
and
Parameter
Sharing
251
7.11 Bagging and Other
Ensemble
Metho
Methods
ds . .. .. .. .. .. .. .. .. .. .. .. .. .. .. 255
7.10 Drop
Sparse
7.12
Dropout
outRepresen
. . . . tations
. . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 253
257
7.11
Bagging
and
Other
Ensemble
Metho
ds
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
255
7.13 Adv
dversarial
ersarial Training . . . . . . . . . . . . . . . . . . . . . . . . . 267
7.12
out Distance,
. . . . . .Tangent
. . . . .Prop,
. . . and
. . .Manifold
. . . . . T.angent
. . . . Classifier
. . . . . 268
257
7.14 Drop
Tangent
7.13 Adversarial Training . . . . . . . . . . . . . . . . . . . . . . . . . 267
7.14 Tangent Distance,
Tangent
Prop,
and
268
Optimization
for Training
Deep
Mo
Models
delsManifold Tangent Classifier 274
8.1 Ho
How
w Learning Differs from Pure Optimization . . . . . . . . . . . 275
Optimization
raining
Deep
Models
8.2
Challengesfor
in T
Neural
Netw
Network
ork Optimization
. . . . . . . . . . . . 274
282
8.1
Ho
w
Learning
Differs
from
P
ure
Optimization
.
.
.
.
.
.
.
.
.
.
.
275
8.3 Basic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
8.2
Challenges
in
Neural
Netw
ork
Optimization
.
.
.
.
.
.
.
.
.
.
.
.
8.4 Parameter Initialization Strategies . . . . . . . . . . . . . . . . . 282
301
8.3 Algorithms
Basic Algorithms
. . . . e. Learning
. . . . . Rates
. . . . .. .. .. .. .. .. .. .. .. .. .. .. .. 306
294
8.5
with Adaptiv
Adaptive
8.4 Appro
P
arameter
Initialization
Strategies
301
8.6
Approximate
ximate
Second-Order
Metho
Methods
ds. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 310
8.5
Algorithms
with
Adaptiv
e
Learning
Rates
.
.
.
.
.
.
.
.
.
.
.
.
.
306
8.7 Optimization Strategies and Meta-Algorithms . . . . . . . . . . . 318
8.6 Approximate Second-Order Methods . . . . . . . . . . . . . . . . 310
8.7
Optimization
Strategies
318
Con
Conv
volutional
Netw
Networks
orks and Meta-Algorithms . . . . . . . . . . . 331
9.1 The Con
Conv
volution Op
Operation
eration . . . . . . . . . . . . . . . . . . . . . 332
Convolutional
9.2
Motiv
Motivation
ationNetw
. . . orks
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
336
9.1
The
Con
v
olution
Op
eration
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
332
9.3 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
9.2
Motiv
ation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9.4 Con
346
Conv
volution and Pooling as an Infinitely Strong Prior . . . . . . . 336
9.3
P
o
oling
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
340
9.5 Variants of the Basic Con
Conv
volution Function . . . . . . . . . . . . 348
9.4
Con
v
olution
and
P
o
oling
as
an
Infinitely
Strong
Prior
.
.
.
.
.
.
.
346
9.6 Structured Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . 359
9.5 Data
Variants
ofesthe
9.7
Typ
ypes
. .Basic
. . . Con
. . v. olution
. . . .F
. unction
. . . . . .. .. .. .. .. .. .. .. .. .. .. .. 348
361
9.6
Structured
Outputs
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
359
9.8 Efficien
Efficientt Con
Conv
volution Algorithms . . . . . . . . . . . . . . . . . . 363
9.7
Data
T
yp
es
. . . ervised
. . . . .Features
. . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 364
361
9.9 Random or Unsup
Unsupervised
9.8 Efficient Convolution Algorithms . . . . . . . . . . . . . . . . . . 363
iii
9.9 Random or Unsupervised Features
. . . . . . . . . . . . . . . . . 364