In fact, the matrix formulation of Fibonacci numbers mentioned before can be generalized to represent any 2-D matrix and, by way of continued fractions, any real number – not just the golden mean. This post is inspired from David Austin’s AMS feature column and draws on Phillipe Flajolet’s beautiful survey on continued fractions to set the stage for a future post on Diophantine approximation.
More will be told of this story as his voluminous writings from the hinterland are being read. But between 1945 and 1970 he published mathematics of unparalleled sweep and power, conveying escalations of abstraction to the solution of concrete problems, and this is the part we wish to appreciate.
The aim of this post is to highlight the utility of non-negative factorization (NMF) in data analysis through examples. In a different post, we’ll talk theory and implementation by thinking of NMF as constrained optimization.
Learning parts of a whole
When Lee and Seung [Lee99] re-introduced non-negative matrix factorization (NMF), they emphasized on the algorithm’s ability to learn parts of a whole. Continue reading
Kuhn enrolled at Princeton in the fall of 1947. He wrote his doctoral dissertation in group theory under the direction of Ralph Fox. Concurrently, he joined mathematics professor A.W. Tucker and fellow graduate student David Gale in a hastily organized summer project to study the suspected equivalence between linear programming and matrix game theory. That project, he later wrote, “set the course of my subsequent academic career, which has centered around the applications of mathematics to economics.” In 1980, the three shared the John von Neumann Theory Prize of the Operations Research Society of America (now part of INFORMS) for their pioneering work in game theory and optimization.
As part of the celebration of PRL’s 50th anniversary, we will be presenting throughout 2008 a series of milestone Letters that made long-lived contributions to physics, either by announcing significant discoveries, or by initiating new areas of research. A number of these articles report on work that was later recognized with a Nobel Prize for one or more of the authors. Starting the week of January 2, we will present a few important Letters from PRL in 1958, and the next week from 1959, etc., continuing up through the year 2000.
Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) [Beck09] was one of the first algorithms to use Nesterov’s accelerated gradient descent to speed up the convergence of iterative shrinkage-thresholding (ISTA) from O(β/ε) to O(√(β/ε)). The algorithm and proof sketch for in the previous post based were based on [Bubeck14] and [Beck09] so we’ll skip them. Instead give an R implementation and results that compares ISTA and FISTA performance (in terms of number of iterations) for various step sizes (iterations increase and step size decreases). Continue reading
The lower bound on the oracle complexity of continuously differentiable, β-smooth convex function is O(1/√ε) [Theorem 2.1.6, Nesterov04; Theorem 3.8, Bubeck14; Nesterov08]. General first-order gradient descent does not achieve this – e.g. L1-Prox, or ISTA, achieves O(1/ε). Nesterov, in a series of papers [Nesterov83, Nesterov88, Nesterov07], proposed techniques to improve the convergence rate for smooth functions to O(1/t2). In this post, we discuss Nesterov acceleration. Continue reading
General first-order methods use only the values of objective/constraints and their subgradients to optimize non-smooth (including constrained) function . Such methods that are oblivious to the structure of the problem are termed black-box optimization techniques. Continue reading