Time Estimates: Videos: 15 min Readings: 40 min Activities: 60 min Check-ins: 1
Gradient Descent
It’s often the case that there aren’t nice linear algebra moves to make to solve an optimization problem, e.g. computing regression coefficients that minimize the sum of the squared residuals. Recall, again, that we had the following expression for computing our multiple regression coefficients:
\(\hat{\beta} = (X'X)^{-1} X'Y\)
This expression gave us the coefficients that minimized the sum of squared errors in our multiple regression model:
\(\sum_{i=1}^n (y_i - \hat{y}_i)^2\)
Another category of approaches attempt to solve this problem in an iterative way:
Start the \(\hat{\beta}_0, \dots, \hat{\beta}_p\) coefficients with some initial values.
Compute the error (the expression above).
Change the values of the coefficients in a way that reduces the error.
Repeat steps 2 and 3 until we converge.
While this may sound simple enough, steps 3 and 4 are not trivial. One specific case of this procedure is known as gradient descent and can also be used to estimate our regression coefficients.
Once again, we’re not going to dwell too much on the mathematical derivations, but now that you’ve gained some knowledge in matrix operations in R you should try to understand the general ideas here. You are not responsible for knowing the calculus and linear algebra used in the video.