cs267 hw0 cyclades (1) .pdf

File information

This PDF 1.5 document has been generated by TeX / pdfTeX-1.40.17, and has been sent on pdf-archive.com on 25/01/2017 at 07:56, from IP address 108.205.x.x. The current document download page has been viewed 277 times.
File size: 85.84 KB (2 pages).
Privacy: public file

Document preview

CS267 HW0: Cyclades
David Winer
January 25, 2017

Personal bio
My name is David Winer, and I am a Masters student in EECS. I studied Applied Math
at Brown for undergrad. In this course, I am hoping to learn more about applications of
parallel computing in industry and research, with specific focus in modern data science and
machine learning. I am currently working with Lee Fleming (IEOR) on applications of
natural language processing (NLP) techniques to US patent data.

Application description
Cyclades is a general framework for parallelizing a family of machine learning algorithms:
stochastic update (SU) algorithms [1]. It builds off of the work of Hogwild!, an update
scheme that allows for parallel stochastic gradient descent (SGD) updates without the use of
locks [2]. It is more general than Hogwild!, however, since it can be applied to any arbitrary
SU algorithm, not just stochastic gradient descent (e.g., stochastic variance reduced gradient,
SAGA). The trick to the Cyclades approach is allocating related updates u1 , . . . , un to cores
in a way that minimizes conflicts. Given certain sparsity constraints, Cyclades can achieve
nearly linear speedup (while guaranteeing similar convergence behavior as you would expect
from a serial algorithm).

Target platform and performance
The initial Cyclades implementation was developed in C++ and tested on a 72-CPU
machine spread across 4 NUMA nodes. Each node had 18 CPUs and 1 TB of memory. It was
tested on four machine learning tasks (each of which employs an SU algorithm): least squares,
computing the top eigenvector of AT A for an adjacency matrix A, matrix completion, and
semantic word embedding (Word2Vec). Each task was tested with a maximum of 18 threads
(i.e., within one NUMA node). Across all tasks, Cyclades achieved significant speedup
over a serial approach: between 3x (graph eigenvectors) and 12x (word embedding) on 18
threads. Even more importantly, it outperformed Hogwild! solutions to each of these
problems, especially with a larger number of threads (> 2 − 4).


The main limitation associated with this approach is the sparsity constraints required for
Cyclades to achieve meaningful speedup. For example, for a dense machine learning
problem where each update requires accessing every component of the model variable x,
Cyclades would not produce better performance than a serial algorithm.

[1] X. Pan, M. Lam, S. Tu, D. Papailiopoulos, C. Zhang, M. Jordan, K. Ramchandran, C.
Re, and B. Recht. Cyclades: Conflict-free asynchronous machine learning. NIPS, 2016.
[2] F. Niu, B. Recht, C. Re, and S. J. Wright. Hogwild! A lock-free approach to parallelizing stochastic gradient descent. NIPS, 2011.


Download original PDF file

cs267-hw0-cyclades (1).pdf (PDF, 85.84 KB)


Share on social networks

Link to this page

Permanent link

Use the permanent link to the download page to share your document on Facebook, Twitter, LinkedIn, or directly with a contact by e-Mail, Messenger, Whatsapp, Line..

Short link

Use the short link to share your document on Twitter or by text message (SMS)


Copy the following HTML code to share your document on a Website or Blog

QR Code to this page

QR Code link to PDF file cs267-hw0-cyclades (1).pdf

This file has been shared publicly by a user of PDF Archive.
Document ID: 0000542752.
Report illicit content