[1] **Bandits with concave rewards and convex knapsacks,** Shipra Agrawal and Nikhil R. Devanur

Introduces and gives polynomial-time near-optimal algorithms for a general model for bandit exploration-exploitation. The algorithm is an extension of the Upper Confidence Bound (UCB) algorithm for the multi-armed bandits problem. The new framework allows them to give more efficient algorithms for other problems such as Blackwell approachability, online convex optimization and conditional-gradient/projection-free/Frank-Wolfe algorithm.