Nonnegative Matrix Factorization, 1

The aim of this post is to highlight the utility of non-negative factorization (NMF) in data analysis through examples. In a different post, we’ll talk theory and implementation by thinking of NMF as constrained optimization.

Learning parts of a whole

When Lee and Seung [Lee99] re-introduced non-negative matrix factorization (NMF), they emphasized on the algorithm’s ability to learn parts of a whole. Continue reading

Obituaries: Harold Kuhn (1925–2014)

Kuhn enrolled at Princeton in the fall of 1947. He wrote his doctoral dissertation in group theory under the direction of Ralph Fox. Concurrently, he joined mathematics professor A.W. Tucker and fellow graduate student David Gale in a hastily organized summer project to study the suspected equivalence between linear programming and matrix game theory. That project, he later wrote, “set the course of my subsequent academic career, which has centered around the applications of mathematics to economics.” In 1980, the three shared the John von Neumann Theory Prize of the Operations Research Society of America (now part of INFORMS) for their pioneering work in game theory and optimization.

Obituaries: Harold Kuhn (1925–2014).

Physical Review Letters, Past

Tags

As part of the celebration of PRL’s 50th anniversary, we will be presenting throughout 2008 a series of milestone Letters that made long-lived contributions to physics, either by announcing significant discoveries, or by initiating new areas of research. A number of these articles report on work that was later recognized with a Nobel Prize for one or more of the authors. Starting the week of January 2, we will present a few important Letters from PRL in 1958, and the next week from 1959, etc., continuing up through the year 2000.

http://journals.aps.org/prl/50years/milestones

Accelerated first-order methods for regularization

Tags

, , ,

FISTA

Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) [Beck09] was one of the first algorithms to use Nesterov’s accelerated gradient descent to speed up the convergence of iterative shrinkage-thresholding (ISTA) from O(β/ε) to O(√(β/ε)).  The algorithm and proof sketch for in the previous post based were based on [Bubeck14] and [Beck09] so we’ll skip them. Instead give an R implementation and results that compares ISTA and FISTA performance (in terms of number of iterations) for various step sizes (iterations increase and step size decreases). Continue reading

Accelerating first-order methods

The lower bound on the oracle complexity of continuously differentiable, β-smooth convex function is O(1/√ε) [Theorem 2.1.6, Nesterov04; Theorem 3.8, Bubeck14; Nesterov08]. General first-order gradient descent does not achieve this – e.g. L1-Prox, or ISTA, achieves O(1/ε). Nesterov, in a series of papers [Nesterov83, Nesterov88, Nesterov07], proposed techniques to improve the convergence rate for smooth functions to O(1/t2). In this post, we discuss Nesterov acceleration. Continue reading