S&DS 631: Computation and Optimization. Spring 2020.

The syllabus for the course is here.

This page lists everything else you need to know for the course. If you don't see it here, check the syllabus before asking.

Time: Tuseday & Thursday, 1:00-2:15

Where: WTS 48

Dan's office hours: Fridays 2:00 - 3:00, by Zoom.

TA: Tracey (Xinyi) Zhong: office hours Monday 2:45-3:45, room 225 of DL, 10HLH

The following is the list of recommended readings, notes, and homeworks for each lecture. Those for future dates are speculative.

For background, I recommend Solomon, Chapter 1. You should have seen most of this before. But, I will cover material from 1.4.3 in some class in more detail.

Many readings come from the following resources, which are freely available online (although some will require access from inside Yale or the VPN).

S: Numerical Algorithms by Solomon.
BV: Convex Optimization by Boyd and Vandenberghe
NW: Numerical Optimization by Nodecal and Wright.
LY: Linear and Nonlinear Programming, by Luenberger and Ye.

Jan 14: Introduction

Reading is Solomon, Sections 2.1 and 2.2 .
Recommended additional reading: https://software.intel.com/en-us/articles/how-memory-is-accessed and the beginning of https://mitmath.github.io/18337/lecture2/optimizing
My notes
Julia Jupyter notebook, also available in pdf

Jan 16: Automatic Differentiation

Reading is NW, Chapter 8, Section 1, through "Approximating the Gradient", and
NW, Chapter 8, Section 2, through "The Reverse Mode".
Additional suggested reading: MIT 18.337: Automatic Differentiation and Applications
Suggested background on differentiation: section 1.4 of Solomon; section on derivatives in Appendix A of NW (starts around A.44).
W. Baur and V. Strassen. The complexity of partial derivatives. Theoretical Computer Science, 22(3):317 330, February 1983
My lecture notes, Jupyter notebook, and pdf of the Jupyter notebook.

Jan 21: Norms, accuracy, and solving linear equations by gradient descent.

My lecture notes
Reading about norms: Solomon, Chapter 4.3.
Solving linear equations by GD: Solomon, Chapter 11.1, or LY 8.2 or NW 3.3. Note these often talk about a PSD matrix M or Q, which is our A^TA.
Reading about forward and backward error: Solomon 2.2, or Higham's Accuracy and Stability of Numerical Algorithms, 1.5.
First problem set out today.

Jan 23: Condition numbers.

Suggested reading: Demmel, Applied Numerical Linear Algebra, Sections 2.2.
Solomon, Section 4.3.2.
My lecture notes

Jan 28: Solving linear equations by LU factorization. Pivoting

Reading: Solomon, Chapter 3, or Trefethen & Bau Chapters 20 & 21.
My lecture notes
A Julia/Jupyter notebook for this and next lecture, and a pdf of it.

Jan 30: More pivoting (partial), and Positive Definite Matrices.

Reading: Solomon, Chapter 3, or Trefethen & Bau Chapters 22 & 23
See also Trefethen’s class essay “Three Mysteries of Gaussian Elimination”, about which I note
- Strassen’s algorithm starts to win in practice around n = 100.

Feb 4: The Randomized Kaczmarz method of solving linear equations, and condition numbers under perturbation.

My lecture notes
Sections 14.1 and 14.2 of Vishnoi's Lx=b
A randomized Kaczmarz algorithm with exponential convergence, by Strohmer and Vershynin
The probabilistic analysis of condition numbers comes from “Smoothed Analysis of the Condition Numbers and Growth Factors of Matrices“. The right analysis of the probability that a perturbed matrix has small singular value comes from the very recent “Gaussian Regularization of the Pseudospectrum and Davies' Conjecture”.
The best analysis of growth factors of partial pivoting under perturbation are from Arvind Sankar’s thesis.
Jupyter Julia notebook, and in pdf.

Feb 6: Singular values and vectors.

Suggested reading: Chapter 3 (at least Sections 3.1-3.8) of Foundations of Data Science by Blum, Hopcroft and Kannan.
My lecture notes

Feb 11-20: Linear Programming

My approach to linear programming is a little unusual. First, I like to treat it as a geometrically rather than algebraically. Second, I like to describe it in the polar form rather than the standard. I can not find any nice treatments of the polar form. So, I suggest multiple sources, and hope that one resonates with you.

Notes from lectures I gave long ago: BAP11, BAP12, BAP13,
The geometric treatment of Jeff Erickson, in his notes: Chapter H, sections 1,2,4,5,7,8. And, Chapter I, sections 1,2,3,
LY: Sections 2.1., 2.3, 2.4 and 2.5
Notes by Michel Goemans, Sections 1-10

Feb 11: Linear Programming and Convex Sets

Reading: BV Sections 2.1, 2.2 and 2.5, and 4.3 (not 4.3.1 or 4.3.2)
My lecture notes

Feb 13: Linear Programming: the simplex method and duality, in the Polar

My lecture notes

Feb 18: Linear Programming: strong duality, feasibility, Farkas' Lemma.

Feb 20: Linear Programming: fast first-order methods, and overview of what's known

Feb 25: Introduction to Convex Programming. Convex Functions

Suggested Reading: BV Sections 3.1 and 3.2
my lecture notes

Feb 27: Gradient descent for convex functions.

The results that I will present appear as
- Theorem 1 in Section 8.2 of LY, and
- The analysis in Section 9.3.1 of BV
my lecture notes

March 2:

In class Midterm (10% of grade)

March 24 & 26: Lagrange multipliers and the KKT conditions for optimality.

Recommended reading: Chapter 5 of BV, and Chapter 10 of Undergraduate Convexity by Lauritzen (available through the Yale Library, inside the VPN)
March 24:
- Mostly based on Sections 10.1 - 10.3 of Undergraduate Convexity by Lauritzen
- my lecture notes
March 26:
- Mostly based on Section 10.4 of Lauritzen, Sections 5.1-5.2 of BV and Section 5.9 of BV (which relies on sections 2.4, 2.6, 3.6 and 4.6)
- my lecture notes, and what I wrote in class

March 31: Polynomial time algorithms for convex programming.

Recommended reading:
- Lauritzen's book, Section 10.5.
- Section 4.1 of the book Combinatorial Optimization, Theory and Algorithms, by Korte and Vygen. Available as a pdf from Springer when inside Yale VPN.
- Lecture notes by Ryan O'Donnell: mostly lecture 9, but lecture 8 could also be helpful.
- my lecture notes, and what I wrote in class
References to algorithms for solving convex optimization problems that I mentioned in class include:
- The book Geometric Algorithms and Combinatorial Optimization, by Grötschel, Lovasz, and Schrijver. (available inside Yale VPN).
- The paper "Solving convex programs by random walks" by Bertsimas and Vempala.
- The paper "A Faster Cutting Plane Method and its Implications for Combinatorial and Convex Optimization" by Lee, Sidford and Wong.

April 2: Introduction to NP.

For a clean and humorous introduction to NP, I recommend sections 12.1-12.5 of Jeff Erickson's book, but where Erickson uses the boolean satisfiability problem, I will use solvability of polynomial equations.
For an introduction to NP hardness in the context of optimization problems, I recommend Vavasis's survey: Complexity Issues in Global Optimization.
For a list of many NP-hard problems, I suggest A compendium of NP optimization problems
my lecture notes, and what I wrote in class
And, note that the zoom meeting was recorded.

April 7: MaxCut and some NP-hard problems.

We will use the hardness of MaxCut to prove that the following problems are NP-hard:
- Maximizing a convex function over a convex polytope.
- Finding the maximum norm point in a polytope.
- Approximating the p-norm of a matrix, for p > 2.
An introduction to maxcut can be found on Wikipedia, and some discussion of our first goal is in this blog post .
The result for matrix p-norms appears in Section 6 of the paper Approximating Matrix p-norms by Bhaskara and Vijayaraghavan. (warning: there are a few typos in that section, which I think I have fixed in my notes).
my lecture notes, and what I wrote in class

April 9: Exact Cover -> Sparse solutions to linear equations, and solving most linear equations, k-Color, NMF, and low-rank matrix completion.

I can't find good references for most of this. The papers I reference are:
- Peeters, "Orthogonal Representations over Finite Fields and the Chromatic Number of Graphs"
- Shitov, "A short proof that NMF is NP-hard" (warning - haven't checked it carefully)
- Vavasis, "On the complexity of nonnegative matrix factorization"
- For more on NMF, I recommend "Computational Limits for Matrix Completion" by Hardt, Meka, Raghavendra, and Weitz.
my lecture notes,

April 14: Sparse (exact) solutions to linear systems: Orthogonal Matching Pursuit

Recommending reading: Sections 4.2 and 4.3 of Moitra's Algorithmic Aspects of Machine Learning
my lecture notes

April 16: Sparse (approximate) solutions to linear systems: LASSO

We will cover Theorem 1.1 from the paper Stable signal recovery from incomplete and
inaccurate measurements by Candès, Romberg, and Tao.
If there is time, we will do some experiments.
my lecture notes, what we wrote in class
The jupyter notebook, and a pdf of it.

April 21: Approximating MaxCut

Recommended reading: the famous paper of Goemans and Williamson
Sections 6.1 and 6.2 from The Design of Approximation Algorithms by Shmoys and Williamson
I will also cover the local search algorithm described in the lecture notes of Chandra Chekuri
my lecture notes, what we wrote in class

April 23: Matrix completion by nuclear norm minimization, and other stuff I want to say in the last class.

Section 7 of Ankur Moitra's book.
my lecture notes, what we wrote in class

May 6: Final Exam -- CANCELLED