RomanLyapin MAB18 Abstract .pdf
Original filename: RomanLyapin_MAB18_Abstract.pdf
This PDF 1.3 document has been generated by Preview / Mac OS X 10.12.6 Quartz PDFContext, and has been sent on pdf-archive.com on 24/04/2018 at 13:38, from IP address 5.57.x.x.
The current document download page has been viewed 561 times.
File size: 101 KB (1 page).
Privacy: public file
Download original PDF file
RomanLyapin_MAB18_Abstract.pdf (PDF, 101 KB)
Share on social networks
Link to this file download page
Deep architectures for Bayesian optimization
February 14, 2018
One of the main challenges involved with running an e-commerce website lies in developing
and maintaining the relevancy of its content. At Booking.com the typical approach to this
problem involves A/B testing each new feature or deployed model to prove it helps to improve
the customer experience. However, such setting may be not optimal for non-binary features or
interactions between diﬀerent website components because we have to split the traﬃc between
all considered variants and that hurts the power of the statistical tests. The alternative for
that would be more eﬃcient dynamic traﬃc allocation where we focus more on the variants
that initially perform better.
This formulation nicely fits within exploration-exploitation paradigm and we can try to
apply methods from Bayesian optimization to approach it. The benefits we get from that
include having an essentially non-parametric treatment of our data and analytical expressions
for our posteriors and decision rules (see Srinivas et.al., 2010 or Snoek et.al., 2012). The
downsides involve high computational complexity of the algorithms and overall diﬃculty in
choosing a reasonable prior over functions, i.e. the kernel and hyperparameter choices in the
underlying Gaussian process (GP), especially for multidimensional domains.
Several recent papers approach the later issue by proposing new ways to infer from data
wider classes of kernels underlying GP or GP themselves. One such example involves stacking several basic GP on top of each other (Damianou and Lawrence, 2012). Others follow
the seminal result (Neal, 1994) linking GP with infinite-width neural networks and examine potential kernels for more practical networks with less width and more depth (Lee et.al.,
2017) while some (Wilson et.al., 2016) combine two approaches. The presented work examines
whether these new GP inference methods help to improve our performance during Bayesian
To keep it concrete and manageable we reduce the scope of the original bigger problem
to optimizing with respect to the 1D signal with a moving seasonal component. Without
any further refinements, such setting breaks assumptions underlying standard 1D Bayesian
optimization and forces us to operate in 2D domain. We compare the performance we get on
synthetic problems using deep architectures against the baselines that fit separate 1D signal
for each season using standard kernels and consider possible applications using Booking.com
Link to this page
Use the permanent link to the download page to share your document on Facebook, Twitter, LinkedIn, or directly with a contact by e-Mail, Messenger, Whatsapp, Line..
Use the short link to share your document on Twitter or by text message (SMS)
Copy the following HTML code to share your document on a Website or Blog